Scholar

Mingze Xu

Google Scholar ID: KNcECJQAAAAJ

Adobe Firefly

Computer VisionMachine Learning

Citations & Impact

All-time

Citations

2,899

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

4 items

2025

Cited

2025

Cited

2025

Cited

2025

Cited

Resume (English only)

Academic Achievements

Selected Publications: AToken: A Unified Tokenizer for Vision (Technical Report, 2025); UniGen: Enhanced Training & Test-Time Strategies for Unified Multimodal Understanding and Generation (NeurIPS, 2025); StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant (NeurIPS, 2025); SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding (COLM, 2025); MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning (ICLR, 2025); SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models (Technical Report, 2024); SkeleTR: Towards Skeleton-based Action Recognition in the Wild (ICCV, 2023); An In-depth Study of Stochastic Backpropagation (NeurIPS, 2022).

Research Experience

Senior Applied Scientist: Adobe Firefly; Previously worked or interned at: Apple, Cruise, Amazon, and Microsoft Research.

Education

Ph.D. in Computer Science: Indiana University, 2020, advisor: Prof. David Crandall; Visiting Student Researcher: Georgia Institute of Technology, 2018, working with: Prof. Dhruv Batra and Prof. Devi Parikh.

Background

Research interests: computer vision and machine learning; current focus: developing unified encoders and LLMs for multi-modalities (text, image, video, and 3D) across understanding and generative tasks.

Miscellany