Paper 'Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning' accepted at WACV 2026 (Round-1 early acceptance, 6.4% acceptance rate)
Paper accepted at EMNLP 2025 (Findings)
Paper 'DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion' accepted at ICCV 2025
Paper accepted at WACV 2025
Paper accepted at ASRU 2023
Paper accepted at Interspeech 2022
Research Experience
PhD Researcher, Intelligent Human Perception Lab, Institute for Creative Technologies, USC (since Aug 2024)
Founding Research Engineer at Anoki AI (from Apr 2023), working on multimodal content understanding and retrieval
Data Scientist at LG Ad Solutions (from Jul 2021), working on speaker recognition, automatic content recognition using audio, and voice cloning
Interned at Adobe Research
Interned at Vision and AI Lab, IISc Bengaluru, advised by Prof. R. Venkatesh Babu
Interned at IIT Roorkee, advised by Prof. R. Balasubramanian
Background
CS PhD student at the Institute for Creative Technologies, University of Southern California
Advised by Prof. Mohammad Soleymani at the Intelligent Human Perception Lab
Research focuses on post-training techniques (e.g., preference optimization) for multimodal (audio and video) LLMs to enhance social and emotion understanding
Collaborates on diffusion-based video generation projects for modeling social behaviors
Research interests: Multimodal LLM tuning and post-training, emotion understanding, Social AI