Published 'Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation' at NeurIPS 2024, introducing USDM—a paralinguistic-aware spoken dialog model.
Published 'Does Your Voice Assistant Remember? Analyzing Conversational Context Recall and Utilization in Voice Interaction Models' in ACL Findings 2025, proposing the ContextDialog benchmark to evaluate context recall in voice models.
Oral presentation at INTERSPEECH 2023: 'UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data', enabling personalized TTS and any-to-any voice conversion with only 5–10 seconds of untranscribed speech.
Published 'Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance' at ICML 2022, using diffusion models and classifier guidance with untranscribed long-form speech for TTS.
Published 'VoiceTailor: Lightweight Plug-In Adapter for Diffusion-Based Personalized Text-to-Speech' at INTERSPEECH 2024, proposing a parameter-efficient one-shot speaker adaptation method using low-rank adapters.
Background
Currently a Senior Research Engineer at Qualcomm AI Research Korea, working on developing more human-like, real-time voice agents.
Has broad research interests, primarily focused on multimodal large language models in speech and audio.
Particularly interested in speech large language models (speech LLMs) and spoken dialog models.
Has consistently focused on diffusion models in both past and current research.
Previous work centered on speech synthesis, including text-to-speech (TTS) and voice conversion, with emphasis on personalization and data efficiency.