Scholar

Ashutosh Chaubey

Google Scholar ID: 8g_xYb0AAAAJ

CS PhD, University of Southern California

Computer VisionMultimodal AISpeech Processing

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailachaubey@usc.edu CVOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

9 items

Do Audio LLMs Listen or Read? Analyzing and Mitigating Paralinguistic Failures with VoxParadox

2026

Cited

GDPO-Listener: Expressive Interactive Head Generation via Auto-Regressive Flow Matching and Group reward-Decoupled Policy Optimization

2026

Cited

MoD-DPO: Towards Mitigating Cross-modal Hallucinations in Omni LLMs using Modality Decoupled Preference Optimization

2026

Cited

AVERE: Improving Audiovisual Emotion Reasoning with Preference Optimization

2026

Cited

Sparks of Rationality: Do Reasoning LLMs Align with Human Judgment and Choice?

2026

Cited

Can VLMs Recall Factual Associations From Visual References?

2025

Cited

Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning

2025

Cited

DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion

2025

Cited

Resume (English only)

Academic Achievements

Paper 'Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning' accepted at WACV 2026 (Round-1 early acceptance, 6.4% acceptance rate)
Paper accepted at EMNLP 2025 (Findings)
Paper 'DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion' accepted at ICCV 2025
Paper accepted at WACV 2025
Paper accepted at ASRU 2023
Paper accepted at Interspeech 2022

Research Experience

PhD Researcher, Intelligent Human Perception Lab, Institute for Creative Technologies, USC (since Aug 2024)
Founding Research Engineer at Anoki AI (from Apr 2023), working on multimodal content understanding and retrieval
Data Scientist at LG Ad Solutions (from Jul 2021), working on speaker recognition, automatic content recognition using audio, and voice cloning
Interned at Adobe Research
Interned at Vision and AI Lab, IISc Bengaluru, advised by Prof. R. Venkatesh Babu
Interned at IIT Roorkee, advised by Prof. R. Balasubramanian

Background

CS PhD student at the Institute for Creative Technologies, University of Southern California
Advised by Prof. Mohammad Soleymani at the Intelligent Human Perception Lab
Research focuses on post-training techniques (e.g., preference optimization) for multimodal (audio and video) LLMs to enhance social and emotion understanding
Collaborates on diffusion-based video generation projects for modeling social behaviors
Research interests: Multimodal LLM tuning and post-training, emotion understanding, Social AI

Co-authors

4 total

Mohammad Soleymani

Research Associate Professor of CS, USC-ICT

Di Chang

PhD Student, University of Southern California

Maksim Siniukov

University of Southern California

Minh Tran