Hanyang Zhao
Scholar

Hanyang Zhao

Google Scholar ID: ipCfUaQAAAAJ
Columbia University
Reinforcement LearningDiffusion Models
Citations & Impact
All-time
Citations
194
 
H-index
8
 
i10-index
8
 
Publications
11
 
Co-authors
4
list available
Publications
11 items
Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
  • 1. Proposed an efficient and effective off-policy RL method DiFFPO to enhance dLLMs reasoning.
  • 2. Developed CT-PPO in continuous time and space theory and applied it to fine-tuning diffusion models Score as Action.
  • 3. Generalized preference modeling/optimization using Mallows-ranking model beyond Bradley-Terry.
  • 4. Unified perspective on the design space of offline RLHF algorithms RainbowPO.
  • 5. Noise schedule design and convergence analysis of diffusion models: CDPM, Tutorials.
  • 6. Selected as a NeurIPS 2025 Top Reviewer.
  • 7. Paper 'Diffusion Fast and Furious Policy Optimization (DiFFPO)' available on arxiv and short version accepted by NeurIPS 2025 Efficient Reasoning workshop.
Research Experience
  • Summer research internships at Netflix (2025) and Capital One (2024).
Education
  • Ph.D. candidate: Department of IEOR, Columbia University, advised by Professor Wenpin Tang and Professor David D. Yao; M.S. in Financial Engineering, Columbia University; B.S. in Mathematics, Fudan University.
Background
  • Main research interests are in reinforcement learning (RL) and generative models (both LLMs and diffusion models). Focuses on enhancing the design space of algorithms from first mathematical principles and by leveraging the structural properties of the underlying models. Currently looking for researcher opportunities in AI.
Miscellany
  • Email: hz2684@columbia.edu
  • Fan of the Fast & Furious series