Scholar

Mingfei Gao

Google Scholar ID: kMe-G5AAAAAJ

Apple Inc.

Computer VisionDeep Learning

Citations & Impact

All-time

Citations

3,214

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

6 items

2025

Cited

2025

Cited

2025

Cited

2025

Cited

2025

Cited

Journal of Computer Science · 2019

Cited

Resume (English only)

Academic Achievements

- Paper: MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer (arXiv preprint, 2025)
- Tech Report: Apple Intelligence Foundation Language Models (tech report, 2025)
- Paper: UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation (NeurIPS, 2025)
- Paper: SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding (COLM, 2025)
- Paper: MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning (ICLR, 2025)
- Paper: SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models (arXiv preprint, 2024)
- Paper: 4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities (NeurIPS, 2024)
- Paper: 4M: Massively Multimodal Masked Modeling (NeurIPS, 2023, Spotlight)
- Paper: ULIP: Learning Unified Representation of Language, Image and Point Cloud for 3D Understanding (CVPR, 2023)
- Paper: Mask-free OVIS: Open-Vocabulary Instance Segmentation without Manual Mask Annotations (CVPR, 2023)
- Paper: TAG: Boosting Text-VQA via Text-aware Visual Question-answer Generation (British Machine Vision Conference, incomplete)

Research Experience

Education

PhD in Computer Science from the University of Maryland College Park, advised by Prof. Larry S. Davis.

Background

A Staff Research Scientist at Apple working on multimodal foundation models. Previously, she was a Senior Research Scientist at Salesforce Research, Palo Alto, USA.

Co-authors

12 total