Winner of the Best Long Paper Award (1 out of 1250 long-paper submissions) at NAACL 2021. Contributed to multiple research works including curiosity-driven exploration in RLVR, automatic theorem proving and math agents like HunyuanProver, MPS-prover (Neurips 2025), etc. Also contributed to improving the scaling efficiency of search and learning with works such as AlphaLLM (Neurips 2024), LiteSearch (AAAI 2025), etc. Published papers on industrial-level post-training practices, e.g., 'Stabilizing RLHF', 'Collaborative Decoding'.
Research Experience
Served as a Staff Engineer at Google, responsible for effective post-training methods and intelligent agent framework development. Was a Principal Researcher at Tencent AI Lab, focusing on large language model post-training and reinforcement learning over reasoning and agentic tasks.
Education
Ph.D. in Computer Science from the University of Rochester, advised by Professor Daniel Gildea; Master’s degree from the Institute of Computing Technology, Chinese Academy of Sciences, under the mentorship of Dr. Qun Liu.
Background
A Staff Engineer at Google, focusing on effective and efficient post-training methods and developing intelligent agent frameworks capable of performing complex tasks such as reasoning over complex questions (DeepResearch) and generating high-quality documents and slices (Long Horizon RL). Formerly a Principal Researcher from Tencent AI Lab, working on LLM post-training and RL over reasoning and agentic tasks.