Scholar

Hanyang Zhao

Google Scholar ID: ipCfUaQAAAAJ

Columbia University

Reinforcement LearningDiffusion Models

Citations & Impact

All-time

Citations

194

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

11 items

Browse publications on Google Scholar (top-right) ↗

Resume (English only)

Academic Achievements

1. Proposed an efficient and effective off-policy RL method DiFFPO to enhance dLLMs reasoning.
2. Developed CT-PPO in continuous time and space theory and applied it to fine-tuning diffusion models Score as Action.
3. Generalized preference modeling/optimization using Mallows-ranking model beyond Bradley-Terry.
4. Unified perspective on the design space of offline RLHF algorithms RainbowPO.
5. Noise schedule design and convergence analysis of diffusion models: CDPM, Tutorials.
6. Selected as a NeurIPS 2025 Top Reviewer.
7. Paper 'Diffusion Fast and Furious Policy Optimization (DiFFPO)' available on arxiv and short version accepted by NeurIPS 2025 Efficient Reasoning workshop.

Research Experience

Education

Ph.D. candidate: Department of IEOR, Columbia University, advised by Professor Wenpin Tang and Professor David D. Yao; M.S. in Financial Engineering, Columbia University; B.S. in Mathematics, Fudan University.

Background

Main research interests are in reinforcement learning (RL) and generative models (both LLMs and diffusion models). Focuses on enhancing the design space of algorithms from first mathematical principles and by leveraging the structural properties of the underlying models. Currently looking for researcher opportunities in AI.

Miscellany

Co-authors

4 total