Arsha Nagrani
Scholar

Arsha Nagrani

Google Scholar ID: -_2vpWwAAAAJ
Research Scientist, Google
Machine learningComputer VisionSpeech TechnologyDeep Learning
Citations & Impact
All-time
Citations
15,579
 
H-index
43
 
i10-index
59
 
Publications
20
 
Co-authors
9
list available
Resume (English only)
Academic Achievements
  • Recipient of the ELLIS PhD Award for doctoral thesis.
  • Awarded the Google PhD Fellowship during doctoral studies.
  • Published multiple papers at top-tier conferences including NeurIPS, ICCV, CVPR, ACL, ECCV, and Interspeech, such as:
  • — VidChapters-7M (NeurIPS 2023): Introduced a large-scale dataset and tasks for video chapterization.
  • — PaLI-X (arXiv 2023): Scaled up multilingual vision-language models achieving SOTA on 25+ benchmarks.
  • — UnLoc (ICCV 2023): A unified framework for video localization tasks using image-text models like CLIP.
  • — AutoAD series (CVPR/ICCV 2023): Automatic audio description for movies with focus on character recognition and contextual understanding.
  • — Vid2Seq (CVPR 2023): A single-stage dense video captioning model pretrained on narrated videos, achieving SOTA performance.
  • — Modular VQA via Code Generation (ACL 2023): Used LLMs to generate executable code for visual question answering, setting new records on COVR and GQA.
  • — AVFormer (CVPR 2023): Enabled zero-shot audiovisual ASR by injecting vision into frozen speech models.
  • — LanSER (Interspeech 2023): Leveraged LLMs to derive emotion labels from speech for improved speech emotion recognition.