Publications: 'Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought' (NeurIPS 2025), 'Offline Reinforcement Learning for LLM Multi-Step Reasoning' (ACL 2025), 'Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective' (NeurIPS 2025), 'Training Large Language Models to Reason in a Continuous Latent Space' (COLM 2025), etc. Awards: ToolkenGPT received the best paper award at NeurIPS 2023.
Research Experience
Research scientist intern at Meta FIAR lab, mentored by Yuandong Tian and Jason Weston. Involved in multiple research projects such as Guru, OREO, FoR, Coconut, etc.
Education
Ph.D. student at UC San Diego, advised by Zhiting Hu; B.S. in Computer Science from Peking University.
Background
Research interests: machine reasoning. Work includes training large language models to reason with reinforcement learning, exploring reasoning in latent space, building a system-2 reasoning framework using world-model planning, and augmenting LLMs with external tools.