Heeseung Kim
Scholar

Heeseung Kim

Google Scholar ID: 4ojbJpoAAAAJ
Seoul National University
Deep generative modelsSpoken language modelSpeech synthesis
Citations & Impact
All-time
Citations
581
 
H-index
10
 
i10-index
10
 
Publications
19
 
Co-authors
8
list available
Resume (English only)
Academic Achievements
  • Published 'Paralinguistics-Aware Speech-Empowered Large Language Models for Natural Conversation' at NeurIPS 2024, introducing USDM—a paralinguistic-aware spoken dialog model.
  • Published 'Does Your Voice Assistant Remember? Analyzing Conversational Context Recall and Utilization in Voice Interaction Models' in ACL Findings 2025, proposing the ContextDialog benchmark to evaluate context recall in voice models.
  • Oral presentation at INTERSPEECH 2023: 'UnitSpeech: Speaker-adaptive Speech Synthesis with Untranscribed Data', enabling personalized TTS and any-to-any voice conversion with only 5–10 seconds of untranscribed speech.
  • Published 'Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance' at ICML 2022, using diffusion models and classifier guidance with untranscribed long-form speech for TTS.
  • Published 'VoiceTailor: Lightweight Plug-In Adapter for Diffusion-Based Personalized Text-to-Speech' at INTERSPEECH 2024, proposing a parameter-efficient one-shot speaker adaptation method using low-rank adapters.
Background
  • Currently a Senior Research Engineer at Qualcomm AI Research Korea, working on developing more human-like, real-time voice agents.
  • Has broad research interests, primarily focused on multimodal large language models in speech and audio.
  • Particularly interested in speech large language models (speech LLMs) and spoken dialog models.
  • Has consistently focused on diffusion models in both past and current research.
  • Previous work centered on speech synthesis, including text-to-speech (TTS) and voice conversion, with emphasis on personalization and data efficiency.