Jiayi Geng

jiayig@princeton.edu

I am a first year PhD student at the Language Technologies Institute at Carnegie Mellon University, advised by Prof. Graham Neubig.

My research explores how to build reliable machine intelligence that can advance toward and beyond human-level capabilities. I primarily focus on: (1) understanding and ensuring reliability in long-horizon interactions, (2) designing agent memory systems that adaptively update with experience while avoiding unintended shifts, (3) enabling multi-agent collaboration through effective coordination and communication mechanisms, and (4) advancing AI scientists through rigorous evaluation and safe deployment for conducting autonomous research.

Before CMU, I received my Master’s degree at Princeton University, advised by Prof. Danqi Chen and Prof. Thomas L. Griffiths, and my Bachelor’s degree at McGill University, advised by Prof. Xue (Steve) Liu, and Prof. Eric D. Kolaczyk.

News

2025-05	Graduated from Princeton University and started my PhD at LTI CMU!
2025-05	Our paper: Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse has been accepted by ICML 2025!
2025-01	Our paper: Large Language Models Assume People are More Rational than We Really are has been accepted by ICLR 2025!
2024-05	Our paper Language Models as Science Tutors has been accepted by ICML 2024!
2023-09	Started my Master study at Princeton University!

Selected publications

(* indicates equal contribution)

Are Large Language Models Reliable AI Scientists? Assessing Reverse-Engineering of Black-Box Systems

Jiayi Geng^*, Howard Chen^*, Dilip Arumugam, and Thomas L Griffiths

arXiv preprint arXiv:2505.17968, 2025

arXiv
A survey of self-evolving agents: On path to artificial super intelligence

Huan-ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, Yiran Wu, and 1 more author

arXiv preprint arXiv:2507.21046, 2025

arXiv
Using the tools of cognitive science to understand large language models at different levels of analysis

Alexander Ku, Declan Campbell, Xuechunzi Bai, Jiayi Geng, 8 authors, and Thomas L Griffiths

arXiv preprint arXiv:2503.13401, 2025

arXiv
Continual Memorization of Factoids in Large Language Models

Howard Chen^*, Jiayi Geng^*, Adithya Bhaskar, Dan Friedman, and Danqi Chen

arXiv preprint 2411.07175, 2024

arXiv
Mind your step (by Step): Chain-of-thought Can Reduce Performance on Tasks Where Thinking Makes Humans Worse

Ryan Liu^*, Jiayi Geng^*, Addison J Wu, Ilia Sucholutsky, Tania Lombrozo, and Thomas L Griffiths

ICML, 2025

arXiv
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling

Jiahao Qiu, Yifu Lu, Yifan Zeng, Jiacheng Guo, Jiayi Geng, Huazheng Wang, Kaixuan Huang, Yue Wu, and Mengdi Wang

arXiv preprint 2410.16033, 2024

arXiv
Large language Models Assume People are More Rational than We Really are

Ryan Liu^*, Jiayi Geng^*, Joshua C Peterson, Ilia Sucholutsky, and Thomas L Griffiths

ICLR, 2025

arXiv
Language Models as Science Tutors

Alexis Chevalier, Jiayi Geng, Alexander Wettig, Howard Chen, 16 authors, Sanjeev Arora, and Danqi Chen

ICML, 2024

arXiv