- TalkCuts: A Large-Scale Dataset for Multi-Shot Human Speech Video Generation. NeurIPS, 2025
- RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text. ICCV, 2025
- SportsSloMo: A New Benchmark and Baselines for Human-centric Video Frame Interpolation. CVPR, 2024
- Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction. ECCV, 2024
- RoboDreamer: Learning Compositional World Models for Robot Imagination. ICML, 2024
- UniMuMo: Unified Text, Music and Motion Generation. AAAI, 2025
- Revisiting Event-based Video Frame Interpolation. IEEE/RSJ International Conference on Intelligent Robots and Systems, 2024
Research Experience
- Research intern at Meta Reality Labs working with Chengde Wan in 2025.
Education
- Ph.D. in Computer Science at UMass Amherst, advised by Prof. Chuang Gan
- Master's degree in Computer Science at CSE of University of California, San Diego, mentored by Prof. Xiaolong Wang
- Bachelor's degree in Computer Science and Technology at SIST of ShanghaiTech University, worked as a research intern with Prof. Jianbo Shi at the University of Pennsylvania during junior and senior years.
Background
Primary research interest: multi-modality learning and video synthesis.