Scholar
Ali Vosoughi
Google Scholar ID: uyqE3LEAAAAJ
University of Rochester PhD | Microsoft Research & Bosch AI | ML Research Scientist
Multimodal AI
Audio AI
Large Language Models (LLMs)
Generative AI
Computer Vision
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
432
H-index
10
i10-index
10
Publications
20
Co-authors
30
list available
Contact
GitHub
Open ↗
LinkedIn
Open ↗
Publications
8 items
PromptReverb: Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching
2025
Cited
0
Diagnosing Visual Reasoning: Challenges, Insights, and a Path Forward
2025
Cited
0
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models
2025
Cited
0
OPENXRD: A Comprehensive Benchmark and Enhancement Framework for LLM/MLLM XRD Question Answering
2025
Cited
0
MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness
2025
Cited
0
$I^2G$: Generating Instructional Illustrations via Text-Conditioned Diffusion
2025
Cited
0
Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting
2025
Cited
0
Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model
2025
Cited
0
Resume (English only)
Academic Achievements
- Published VERIFY benchmark, Mar 2025
- Presented at SANE 2024 (DeepMind Boston), Oct 2024
- ACM Multimedia 2024 paper accepted, Oct 2024
- Research presentation at Microsoft, Seattle, Aug 2024
- NAACL 2024 paper accepted, Mar 2024
- IEEE Transactions on Multimedia paper, Feb 2024
- Two ICCV 2023 papers accepted, Aug 2023
- Nominated for Donald M. and Janet C. Barnard Fellowship, Apr 2022
- Published multiple research works including VERIFY, EAGLE, Cross Modality Bias in Visual Question Answering, etc.
Research Experience
- Research Scientist Intern, Smule AI, working on Spatial Audio Generation, Jun–Sep 2025
- Research Intern, Microsoft Research, working on Audiovisual LLM, May–Aug 2024
- Research Intern, Bosch AI Research, working on Audio LLM, Apr–Jul 2023
- Graduate Researcher, DARPA PTG, working on Autonomous AR Copilot, 2022–present
Education
Ph.D. Candidate at the University of Rochester, advised by Prof Chenliang Xu.
Background
Research Interests: Agentic AI Systems, Computer Audition, Multimodal Reasoning, Multimodal Generation, Immersive Computing, Reasoning Verification, Reinforcement Learning, Large Action Models, Audio Generation, Video Generation.
Miscellany
Links to GitHub, Google Scholar (345 citations), LinkedIn, HuggingFace profile.
Co-authors
30 total
Chenliang Xu
Associate Professor of Computer Science, University of Rochester
Jing Bi
Univeristy of Rochester
Luchuan Song
University of Rochester
Pinxin Liu
Univeristy of Rochester
Co-author 5
Chao Huang
University of Rochester
Mingqian Feng
University of Rochester
Zeliang Zhang
PhD Candidate @ University of Rochester; BEng @ HUST
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up