- Mogao: An omni foundation model for interleaved multi-modal generation
- Emu3: Next-Token Prediction is All You Need
- End-to-End Alternating Optimization for Real-World Blind Super Resolution
- VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
- Learning the Degradation Distribution for Deep Blind Super Resolution
- Efficient Human Pose Estimation by Learning Deeply Aggregated Representations
- Rethinking the Heatmap Regression for Bottom-up Human Pose Estimation
- Unfolding the Alternating Optimization for Blind Super Resolution
Research Experience
Worked as a full-time researcher at the Beijing Academy of Artificial Intelligence (BAAI) from July 2023 to February 2025; currently a full-time researcher at ByteDance Seed.
Education
Bachelor's degree in Mechanical Engineering from Shanghai Jiao Tong University (SJTU); Ph.D. from the Institute of Automation, Chinese Academy of Sciences (CASIA).
Background
Current research focuses on unifying multimodal generation and understanding. Previous work has covered human pose estimation, low-level vision tasks, and video generation.