Mingze Xu
Scholar

Mingze Xu

Google Scholar ID: KNcECJQAAAAJ
Adobe Firefly
Computer VisionMachine Learning
Citations & Impact
All-time
Citations
2,899
 
H-index
21
 
i10-index
26
 
Publications
20
 
Co-authors
15
list available
Resume (English only)
Academic Achievements
  • Selected Publications: AToken: A Unified Tokenizer for Vision (Technical Report, 2025); UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation (NeurIPS, 2025); StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant (NeurIPS, 2025); SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding (COLM, 2025); MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning (ICLR, 2025); SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models (Technical Report, 2024); SkeleTR: Towards Skeleton-based Action Recognition in the Wild (ICCV, 2023); An In-depth Study of Stochastic Backpropagation (NeurIPS, 2022).
Research Experience
  • Senior Applied Scientist: Adobe Firefly; Previously worked or interned at: Apple, Cruise, Amazon, and Microsoft Research.
Education
  • Ph.D. in Computer Science: Indiana University, 2020, advisor: Prof. David Crandall; Visiting Student Researcher: Georgia Institute of Technology, 2018, working with: Prof. Dhruv Batra and Prof. Devi Parikh.
Background
  • Research interests: computer vision and machine learning; current focus: developing unified encoders and LLMs for multi-modalities (text, image, video, and 3D) across understanding and generative tasks.
Miscellany
  • Hiring Applied Scientists in Multimodal LLM and GenAI, both full-time and interns!