Ashutosh Chaubey
Scholar

Ashutosh Chaubey

Google Scholar ID: 8g_xYb0AAAAJ
CS PhD, University of Southern California
Computer VisionMultimodal AISpeech Processing
Citations & Impact
All-time
Citations
91
 
H-index
4
 
i10-index
1
 
Publications
10
 
Co-authors
4
list available
Resume (English only)
Academic Achievements
  • Paper 'Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning' accepted at WACV 2026 (Round-1 early acceptance, 6.4% acceptance rate)
  • Paper accepted at EMNLP 2025 (Findings)
  • Paper 'DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion' accepted at ICCV 2025
  • Paper accepted at WACV 2025
  • Paper accepted at ASRU 2023
  • Paper accepted at Interspeech 2022
Research Experience
  • PhD Researcher, Intelligent Human Perception Lab, Institute for Creative Technologies, USC (since Aug 2024)
  • Founding Research Engineer at Anoki AI (from Apr 2023), working on multimodal content understanding and retrieval
  • Data Scientist at LG Ad Solutions (from Jul 2021), working on speaker recognition, automatic content recognition using audio, and voice cloning
  • Interned at Adobe Research
  • Interned at Vision and AI Lab, IISc Bengaluru, advised by Prof. R. Venkatesh Babu
  • Interned at IIT Roorkee, advised by Prof. R. Balasubramanian
Background
  • CS PhD student at the Institute for Creative Technologies, University of Southern California
  • Advised by Prof. Mohammad Soleymani at the Intelligent Human Perception Lab
  • Research focuses on post-training techniques (e.g., preference optimization) for multimodal (audio and video) LLMs to enhance social and emotion understanding
  • Collaborates on diffusion-based video generation projects for modeling social behaviors
  • Research interests: Multimodal LLM tuning and post-training, emotion understanding, Social AI