Mu Cai
Scholar

Mu Cai

Google Scholar ID: euruCPEAAAAJ
Google DeepMind
Computer VisionMachine LearningMultimodal
Citations & Impact
All-time
Citations
2,423
 
H-index
19
 
i10-index
21
 
Publications
20
 
Co-authors
24
list available
Resume (English only)
Academic Achievements
  • Published multiple papers in top international conferences such as ICCV, CVPR, ICLR; including 'Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities', 'Toward Versatile and Efficient Multimodal Models' (PhD Thesis), 'LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models', etc.
Research Experience
  • Works as a Research Scientist at Google DeepMind, involved in the Gemini Multimodal project.
Education
  • Received a Ph.D. in Computer Sciences from the University of Wisconsin-Madison, advised by Prof. Yong Jae Lee.
Background
  • Research interests include multimodal models, vision-language models, etc.; currently a Research Scientist at Google DeepMind, working on the Gemini Multimodal project.
Miscellany
  • Has recent talk videos available on criticizing and creating vision-language models; contact information includes email, GitHub, Google Scholar, LinkedIn, Twitter (X), and blog.