publications

(* indicates equal contribution)

2025

Physics Supernova: AI Agent Matches Elite Gold Medalists at IPhO 2025

Jiahao Qiu, Jingzhe Shi, Xinzhe Juan, Zelin Zhao, Jiayi Geng, Shilong Liu, Hongru Wang, Sanfeng Wu, and Mengdi Wang

arXiv preprint arXiv:2509.01659, 2025

arXiv
Are Large Language Models Reliable AI Scientists? Assessing Reverse-Engineering of Black-Box Systems

Jiayi Geng^*, Howard Chen^*, Dilip Arumugam, and Thomas L Griffiths

arXiv preprint arXiv:2505.17968, 2025

arXiv
A survey of self-evolving agents: On path to artificial super intelligence

Huan-ang Gao, Jiayi Geng, Wenyue Hua, Mengkang Hu, Xinzhe Juan, Hongzhang Liu, Shilong Liu, Jiahao Qiu, Xuan Qi, Yiran Wu, and 1 more author

arXiv preprint arXiv:2507.21046, 2025

arXiv
Using the tools of cognitive science to understand large language models at different levels of analysis

Alexander Ku, Declan Campbell, Xuechunzi Bai, Jiayi Geng, 8 authors, and Thomas L Griffiths

arXiv preprint arXiv:2503.13401, 2025

arXiv
Mind your step (by Step): Chain-of-thought Can Reduce Performance on Tasks Where Thinking Makes Humans Worse

Ryan Liu^*, Jiayi Geng^*, Addison J Wu, Ilia Sucholutsky, Tania Lombrozo, and Thomas L Griffiths

ICML, 2025

arXiv
Large language Models Assume People are More Rational than We Really are

Ryan Liu^*, Jiayi Geng^*, Joshua C Peterson, Ilia Sucholutsky, and Thomas L Griffiths

ICLR, 2025

arXiv

2024

Continual Memorization of Factoids in Large Language Models

Howard Chen^*, Jiayi Geng^*, Adithya Bhaskar, Dan Friedman, and Danqi Chen

arXiv preprint 2411.07175, 2024

arXiv
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling

Jiahao Qiu, Yifu Lu, Yifan Zeng, Jiacheng Guo, Jiayi Geng, Huazheng Wang, Kaixuan Huang, Yue Wu, and Mengdi Wang

arXiv preprint 2410.16033, 2024

arXiv
Dr. GPT in Campus Counseling: Understanding Higher Education Students’ Opinions on LLM-assisted Mental Health Services

Owen Xingjian Zhang, Shuyao Zhou, Jiayi Geng, Yuhan Liu, and Sunny Xun Liu

arXiv preprint 2409.17572, 2024

arXiv
Language Models as Science Tutors

Alexis Chevalier, Jiayi Geng, Alexander Wettig, Howard Chen, 16 authors, Sanjeev Arora, and Danqi Chen

ICML, 2024

arXiv

2023

Corgi-pm: A Chinese Corpus for Gender Bias Probing and Mitigation

Ge Zhang, Yizhi Li, Yaoyao Wu, Linyuan Zhang, Chenghua Lin, Jiayi Geng, Shi Wang, and Jie Fu

arXiv preprint 2301.00395, 2023

arXiv