Chengyao Wang
Scholar

Chengyao Wang

Google Scholar ID: 1pZcoqgAAAAJ
The Chinese University of Hong Kong
Multimodal Intelligence
Citations & Impact
All-time
Citations
1,158
 
H-index
7
 
i10-index
7
 
Publications
11
 
Co-authors
15
list available
Resume (English only)
Academic Achievements
  • Aug 2025: Released MGM-Omni, an open-source omni-modal LLM supporting long speech understanding, generation, and zero-shot voice cloning
  • Jun 2025: Paper 'Lyra' accepted to ICCV 2025
  • Mar 2025: Papers 'VisionZip' and 'DreamOmni' accepted to CVPR 2025
  • Dec 2024: Released Lyra, an open-source MLLM supporting long speech comprehension, omni understanding, and cross-modality efficiency
  • Jul 2024: Paper 'LLaMA-VID' accepted to ECCV 2024
  • Mar 2024: Released Mini-Gemini, an open-source vision-language model supporting high-resolution image understanding and reasoning-based image generation
  • Feb 2024: Paper 'GroupContrast' accepted to CVPR 2024
  • Nov 2023: Released LLaMA-VID, an open-source vision-language model supporting hour-long video understanding and reasoning
  • Primary contributor to key projects including MGM-Omni, Lyra, VisionZip, Mini-Gemini, LLaMA-VID, DreamOmni series, and GroupContrast
Background
  • PhD student at the Department of Computer Science and Engineering, The Chinese University of Hong Kong (CUHK)
  • Research interests focus on building Human-like Multimodal Intelligence capable of actively interacting with the physical world, learning from interaction, and possessing long-term memory
  • Recently concentrating on Multi-modal Large Language Models (MLLMs)
  • Previously had experience in visual perception
  • Seeking Research Scientist / Member of Technical Staff positions in industry for Fall 2026 in Multimodal Foundation Models and related applications (e.g., Computer Use Agents, Embodied AI), open to any location