publications
(* indicates equal contribution)
2025
- Physics Supernova: AI Agent Matches Elite Gold Medalists at IPhO 2025arXiv preprint arXiv:2509.01659, 2025
- Are Large Language Models Reliable AI Scientists? Assessing Reverse-Engineering of Black-Box SystemsarXiv preprint arXiv:2505.17968, 2025
- A survey of self-evolving agents: On path to artificial super intelligencearXiv preprint arXiv:2507.21046, 2025
- Using the tools of cognitive science to understand large language models at different levels of analysisarXiv preprint arXiv:2503.13401, 2025
- Mind your step (by Step): Chain-of-thought Can Reduce Performance on Tasks Where Thinking Makes Humans WorseICML, 2025
-
2024
-
- TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N SamplingarXiv preprint 2410.16033, 2024
- Dr. GPT in Campus Counseling: Understanding Higher Education Students’ Opinions on LLM-assisted Mental Health ServicesarXiv preprint 2409.17572, 2024
-
2023
- Corgi-pm: A Chinese Corpus for Gender Bias Probing and MitigationarXiv preprint 2301.00395, 2023