Scholar

Arsha Nagrani

Google Scholar ID: -_2vpWwAAAAJ

Research Scientist, Google

Machine learningComputer VisionSpeech TechnologyDeep Learning

Citations & Impact

All-time

Citations

15,579

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

11 items

2026

Cited

2026

Cited

2026

Cited

2025

Cited

2025

Cited

2025

Cited

2025

Cited

2025

Cited

Resume (English only)

Academic Achievements

Recipient of the ELLIS PhD Award for doctoral thesis.
Awarded the Google PhD Fellowship during doctoral studies.
Published multiple papers at top-tier conferences including NeurIPS, ICCV, CVPR, ACL, ECCV, and Interspeech, such as:
— VidChapters-7M (NeurIPS 2023): Introduced a large-scale dataset and tasks for video chapterization.
— PaLI-X (arXiv 2023): Scaled up multilingual vision-language models achieving SOTA on 25+ benchmarks.
— UnLoc (ICCV 2023): A unified framework for video localization tasks using image-text models like CLIP.
— AutoAD series (CVPR/ICCV 2023): Automatic audio description for movies with focus on character recognition and contextual understanding.
— Vid2Seq (CVPR 2023): A single-stage dense video captioning model pretrained on narrated videos, achieving SOTA performance.
— Modular VQA via Code Generation (ACL 2023): Used LLMs to generate executable code for visual question answering, setting new records on COVR and GQA.
— AVFormer (CVPR 2023): Enabled zero-shot audiovisual ASR by injecting vision into frozen speech models.
— LanSER (Interspeech 2023): Leveraged LLMs to derive emotion labels from speech for improved speech emotion recognition.

Co-authors

9 total