- Awarded the Google PhD Fellowship 2025 in Machine Perception
- Released Perception Language Model (PLM)
- Released VideoGPT+ model, dataset, and benchmark
- Released LLaVA++
- Perception Language Model - PLM (Spotlight) and Perception Encoder (Oral) accepted to NeurIPS 2025
- Video-ChatGPT accepted at ACL 2024
- GLaMM accepted at CVPR 2024
- Published papers: VideoMathQA, PerceptionLM, VideoGPT+, Video-ChatGPT, Mobile-VideoGPT, etc.
Research Experience
- Ph.D. Candidate in the Computer Vision Department at MBZUAI
- Research Scientist Intern at Meta, working with Christoph Feichtenhofer
Education
Ph.D. in Computer Vision, MBZUAI; Advisors: Dr. Salman Khan and Prof. Fahad Khan.
Background
Research Interests: Developing multimodal large language models (MLLMs) for detailed video understanding, multimodal reasoning, and long-video understanding.