Publications
(* indicates equal contribution)

Alita-G: Self-Evolving Generative Agent for Agent Generation
Jiahao Qiu, Xuan Qi, Hongru Wang, Xinzhe Juan, Yimin Wang, Zelin Zhao, Jiayi Geng, Jiacheng Guo, Peihang Li, Jingzhe Shi, Shilong Liu, Mengdi Wang
Preprint
A self-evolution framework that transforms a general-purpose agent into a domain expert by generating, abstracting, and curating MCP tools.

A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence
Huan-ang Gao†, Jiayi Geng†, Wenyue Hua†, Mengkang Hu†, Xinzhe Juan†, Hongzhang Liu†, Shilong Liu†, Jiahao Qiu†, Xuan Qi†, Qihan Ren†, Yiran Wu†, Hongru Wang†, Han Xiao†, Yuhang Zhou†, Shaokun Zhang†, Jiayi Zhang, Jinyu Xiang, Yixiong Fang, Qiwen Zhao, Dongrui Liu, Cheng Qian, Zhenhailong Wang, Minda Hu, Huazheng Wang, Qingyun Wu, Heng Ji, Mengdi Wang
TMLR 2026
A comprehensive survey of self-evolving agents and their path toward artificial super intelligence.

Are Large Language Models Reliable AI Scientists? Assessing Reverse-Engineering of Black-Box Systems
Jiayi Geng†, Howard Chen†, Dilip Arumugam, Thomas L Griffiths
LM4Sci COLM Workshop 2025
Evaluating the reliability of LLMs as AI scientists through reverse-engineering assessments of black-box systems.

Mind Your Step (by Step): Chain-of-Thought Can Reduce Performance on Tasks Where Thinking Makes Humans Worse
Ryan Liu†, Jiayi Geng†, Addison J Wu, Ilia Sucholutsky, Tania Lombrozo, Thomas L Griffiths
ICML 2025
Demonstrating that chain-of-thought prompting can reduce LLM performance on tasks where deliberate thinking hurts human performance.

Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis
Alexander Ku, Declan Campbell, Xuechunzi Bai, Jiayi Geng, Ryan Liu, Raja Marjieh, R. Thomas McCoy, Andrew Nam, Ilia Sucholutsky, Veniamin Veselovsky, Liyi Zhang, Jian-Qiao Zhu, Thomas L Griffiths
Preprint
Applying cognitive science tools to understand LLMs at multiple levels of analysis.

TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
Jiahao Qiu, Yifu Lu, Yifan Zeng, Jiacheng Guo, Jiayi Geng, Huazheng Wang, Kaixuan Huang, Yue Wu, Mengdi Wang
EMNLP Findings 2025
A method combining speculative tree-search with best-of-N sampling for better inference-time alignment.

Dr. GPT in Campus Counseling: Understanding Higher Education Students' Opinions on LLM-assisted Mental Health Services
Owen Xingjian Zhang, Shuyao Zhou, Jiayi Geng, Yuhan Liu, Sunny Xun Liu
Preprint
Understanding student opinions on LLM-assisted mental health services in campus counseling.

Language Models as Science Tutors
Alexis Chevalier, Jiayi Geng, Alexander Wettig, Howard Chen, Sebastian Mizera, Toni Annala, Max Jameson Aragon, Arturo Rodriguez Fanlo, Simon Frieder, Simon Machado, Akshara Prabhakar, Ellie Thieu, Jiachen T Wang, Zirui Wang, Xindi Wu, Mengzhou Xia, Wenhan Xia, Jiatong Yu, Jun-Jie Zhu, Zhiyong Jason Ren, Sanjeev Arora, Danqi Chen
ICML 2024
Exploring the effectiveness of language models as science tutors for educational applications.




