Scholar

Muhammad Maaz

Google Scholar ID: vTy9Te8AAAAJ

PhD Computer Vision at MBZUAI

Computer VisionDeep LearningVision-LanguageGenerative AI

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

4,237

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailmuhammad.maaz@mbzuai.ac.ae CVOpen ↗TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

9 items

Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models

2025

Cited

Video-CoM: Interactive Video Reasoning via Chain of Manipulations

2025

Cited

Agentic Property-Based Testing: Finding Bugs Across the Python Ecosystem

2025

Cited

A Culturally-diverse Multilingual Multimodal Video Benchmark&Model

2025

Cited

VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos

2025

Cited

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

2025

Cited

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model

2025

Cited

A new method for reducing algebraic programs to polynomial programs

2025

Cited

Resume (English only)

Academic Achievements

- Awarded the Google PhD Fellowship 2025 in Machine Perception
- Released Perception Language Model (PLM)
- Released VideoGPT+ model, dataset, and benchmark
- Released LLaVA++
- Perception Language Model - PLM (Spotlight) and Perception Encoder (Oral) accepted to NeurIPS 2025
- Video-ChatGPT accepted at ACL 2024
- GLaMM accepted at CVPR 2024
- Published papers: VideoMathQA, PerceptionLM, VideoGPT+, Video-ChatGPT, Mobile-VideoGPT, etc.

Research Experience

- Ph.D. Candidate in the Computer Vision Department at MBZUAI
- Research Scientist Intern at Meta, working with Christoph Feichtenhofer

Education

Ph.D. in Computer Vision, MBZUAI; Advisors: Dr. Salman Khan and Prof. Fahad Khan.

Background

Research Interests: Developing multimodal large language models (MLLMs) for detailed video understanding, multimodal reasoning, and long-video understanding.

Co-authors

21 total