Co-authors
3
list available
Resume (English only)
Academic Achievements
- - Publications:
- - Fork-Merge Decoding: Enhancing Multimodal Understanding in Audio-Visual Large Language Models (Preprint, 2025)
- - AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding (NIPS, 2025)
- - InfiniteAudio: Infinite-Length Audio Generation with Consistent Acoustic Attributes (Interspeech, 2025)
- - SEED: Speaker Embedding Enhancement Diffusion Model (Interspeech, 2025)
- - From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech (CVPR, 2025)
- - Voicedit: Dual-condition diffusion transformer for environment-aware speech synthesis (ICASSP, 2025)
- - FlowAVSE: Efficient audio-visual speech enhancement with conditional flow matching (Interspeech, 2024)
- - Seeing through the conversation: Audio-visual speech separation based on diffusion model (ICASSP, 2024)
- - Talknce: Improving active speaker detection with talk-aware contrastive learning (ICASSP, 2024)
Research Experience
- - Research Projects: Multimodal learning, generative modeling in audio applications
Education
- - Degree: Ph.D. Student
- - University: KAIST
- - Advisor: Professor Joon Son Chung
- - Time: Started in March 2025
- - Major: Multimedia and Artificial Intelligence (MMAI)
Background
- - Research Interests: Multimodal learning, particularly deepening the understanding and reasoning capabilities of multi-modal large language models (MLLMs)
- - Professional Field: Generative modeling in audio, including text-to-audio generation, speech enhancement, source separation, and lip-to-speech synthesis