Mingfei Gao
Scholar

Mingfei Gao

Google Scholar ID: kMe-G5AAAAAJ
Apple Inc.
Computer VisionDeep Learning
Citations & Impact
All-time
Citations
3,214
 
H-index
21
 
i10-index
23
 
Publications
20
 
Co-authors
12
list available
Resume (English only)
Academic Achievements
  • - Paper: MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer (arXiv preprint, 2025)
  • - Tech Report: Apple Intelligence Foundation Language Models (tech report, 2025)
  • - Paper: UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation (NeurIPS, 2025)
  • - Paper: SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding (COLM, 2025)
  • - Paper: MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning (ICLR, 2025)
  • - Paper: SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models (arXiv preprint, 2024)
  • - Paper: 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities (NeurIPS, 2024)
  • - Paper: 4M: Massively Multimodal Masked Modeling (NeurIPS, 2023, Spotlight)
  • - Paper: ULIP: Learning Unified Representation of Language, Image and Point Cloud for 3D Understanding (CVPR, 2023)
  • - Paper: Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations (CVPR, 2023)
  • - Paper: TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation (British Machine Vision Conference, incomplete)
Research Experience
  • - Staff Research Scientist, Apple Inc.
  • - Senior Research Scientist, Salesforce Research, Palo Alto, USA
Education
  • PhD in Computer Science from the University of Maryland College Park, advised by Prof. Larry S. Davis.
Background
  • A Staff Research Scientist at Apple working on multimodal foundation models. Previously, she was a Senior Research Scientist at Salesforce Research, Palo Alto, USA.