Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
1. Proposed an efficient and effective off-policy RL method DiFFPO to enhance dLLMs reasoning.
2. Developed CT-PPO in continuous time and space theory and applied it to fine-tuning diffusion models Score as Action.
3. Generalized preference modeling/optimization using Mallows-ranking model beyond Bradley-Terry.
4. Unified perspective on the design space of offline RLHF algorithms RainbowPO.
5. Noise schedule design and convergence analysis of diffusion models: CDPM, Tutorials.
6. Selected as a NeurIPS 2025 Top Reviewer.
7. Paper 'Diffusion Fast and Furious Policy Optimization (DiFFPO)' available on arxiv and short version accepted by NeurIPS 2025 Efficient Reasoning workshop.
Research Experience
Summer research internships at Netflix (2025) and Capital One (2024).
Education
Ph.D. candidate: Department of IEOR, Columbia University, advised by Professor Wenpin Tang and Professor David D. Yao; M.S. in Financial Engineering, Columbia University; B.S. in Mathematics, Fudan University.
Background
Main research interests are in reinforcement learning (RL) and generative models (both LLMs and diffusion models). Focuses on enhancing the design space of algorithms from first mathematical principles and by leveraging the structural properties of the underlying models. Currently looking for researcher opportunities in AI.