Scholar

Ali Vosoughi

Google Scholar ID: uyqE3LEAAAAJ

University of Rochester PhD | Microsoft Research & Bosch AI | ML Research Scientist

Multimodal AIAudio AILarge Language Models (LLMs)Generative AIComputer Vision

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

432

H-index

i10-index

Publications

Co-authors

list available

Contact

GitHubOpen ↗LinkedInOpen ↗

Publications

8 items

PromptReverb: Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching

2025

Cited

Diagnosing Visual Reasoning: Challenges, Insights, and a Path Forward

2025

Cited

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

2025

Cited

OPENXRD: A Comprehensive Benchmark and Enhancement Framework for LLM/MLLM XRD Question Answering

2025

Cited

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

2025

Cited

$I^2G$: Generating Instructional Illustrations via Text-Conditioned Diffusion

2025

Cited

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

2025

Cited

Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model

2025

Cited

Resume (English only)

Academic Achievements

- Published VERIFY benchmark, Mar 2025
- Presented at SANE 2024 (DeepMind Boston), Oct 2024
- ACM Multimedia 2024 paper accepted, Oct 2024
- Research presentation at Microsoft, Seattle, Aug 2024
- NAACL 2024 paper accepted, Mar 2024
- IEEE Transactions on Multimedia paper, Feb 2024
- Two ICCV 2023 papers accepted, Aug 2023
- Nominated for Donald M. and Janet C. Barnard Fellowship, Apr 2022
- Published multiple research works including VERIFY, EAGLE, Cross Modality Bias in Visual Question Answering, etc.

Research Experience

- Research Scientist Intern, Smule AI, working on Spatial Audio Generation, Jun–Sep 2025
- Research Intern, Microsoft Research, working on Audiovisual LLM, May–Aug 2024
- Research Intern, Bosch AI Research, working on Audio LLM, Apr–Jul 2023
- Graduate Researcher, DARPA PTG, working on Autonomous AR Copilot, 2022–present

Education

Ph.D. Candidate at the University of Rochester, advised by Prof Chenliang Xu.

Background

Research Interests: Agentic AI Systems, Computer Audition, Multimodal Reasoning, Multimodal Generation, Immersive Computing, Reasoning Verification, Reinforcement Learning, Large Action Models, Audio Generation, Video Generation.

Miscellany