Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
Proposed and open-sourced online rejection-sampling fine-tuning, Reinforce-ada adaptive sampling framework, online DPO, and regret analysis of KL-regularized RL. Co-founded and led the open-source project RLHFlow, which has 2,000 GitHub stars, 500 academic citations, and 1 million Hugging Face downloads. Released the first open-source recipe for generative process rewards.
Research Experience
Served as a Research Intern at Meta FAIR (May 2025 to August 2025), teaching LLMs to segment reasoning trajectories into coherent intermediate steps for improved interpretability and stability of reasoning, and trained a generative process reward model via RL to evaluate and guide step-by-step reasoning. Also worked as a Student Researcher at Google Deepmind's Gemini Post-Training Team (May 2024 to April 2025), formulating a multi-turn RL framework for agent tasks.
Education
Currently a Ph.D. candidate in Computer Science at the University of Illinois Urbana-Champaign, advised by Prof. Tong Zhang and Prof. Nan Jiang; received a Master's degree in Mathematics from The Hong Kong University of Science and Technology in 2023, supported by the Hong Kong PhD Fellowship; obtained a B.S. in Mathematics from the University of Science and Technology of China in 2021, working closely with Prof. Cong Shen.
Background
Research interests include reinforcement learning and its applications in LLM post-training, focusing on the design of core RL algorithms and the development of practical training methods. Also interested in understanding the training dynamics and mathematical foundations behind these methods, with the goal of improving large-scale training stability and final model performance.