Daquan Zhou
Scholar

Daquan Zhou

Google Scholar ID: DdCAbWwAAAAJ
Bytedance, US
Artificial IntelligenceDeep learning
Citations & Impact
All-time
Citations
14,360
 
H-index
36
 
i10-index
46
 
Publications
20
 
Co-authors
5
list available
Resume (English only)
Academic Achievements
  • HunyuanVideo: A Systematic Framework For Large Video Generative Models
  • StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation
  • Magic-Me: Identity-Specific Video Customized Diffusion
  • MagicVideo & MagicVideo-V2: Efficient High-Aesthetic Video Generation with Latent Diffusion
  • DiffFit: Unlocking transferability of large diffusion models via parameter-efficient fine-tuning (CVPR 2023)
  • Expanding small-scale datasets with guided imagination (NeurIPS 2023, Corresponding Author & Project Lead)
  • Diffusion probabilistic model made slim (CVPR 2022)
  • Dataset Quantization (ICCV 2023)
  • Scaling & Shifting Your Features (NeurIPS 2022 Spotlight, Equal First Author)
Research Experience
  • Led model pre-training and diffusion algorithm design for HunyuanVideo
  • Key contributor to video generation projects including StoryDiffusion, Magic-Me, MagicVideo series
  • Pioneered research on parameter-efficient fine-tuning (e.g., DiffFit), small-data expansion, and diffusion model slimming
  • Developed PLLaVA: a parameter-free extension of LLaVA from images to videos for dense video captioning
  • Proposed Dataset Quantization pipeline achieving 5×–10× training speedup (ICCV 2023)
Background
  • Currently a Tenure-track Assistant Professor at Peking University
  • Focused on minimizing energy and memory consumption for training and deploying powerful AI algorithms
  • Applications include Robotics, AIGC, and Vision-Language-Action (VLA) systems
  • Research interests: explainable video representation design and efficient long video generation (both training and inference)
  • Strong interest in hardware-algorithm co-design, especially DNN architecture and memory co-optimization
  • Ongoing work on model and dataset efficiency for discriminative, generative, and multimodal models