Minsu Kim
Scholar

Minsu Kim

Google Scholar ID: TXB0FyoAAAAJ
Google DeepMind
Multimodal LearningAudio-Visual Speech ProcessingGenerative AI
Citations & Impact
All-time
Citations
929
 
H-index
19
 
i10-index
22
 
Publications
20
 
Co-authors
28
list available
Resume (English only)
Academic Achievements
  • International Journal Publications:
  • - TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
  • - Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
  • - Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation
  • - AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
  • - Cromm-vsr: Cross-modal memory augmented visual speech recognition
  • - Speech Reconstruction with Reminiscent Sound via Visual Voice Memory
  • International Conference Papers:
  • - MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition
  • - Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs
  • - Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
  • - Revival with Voice: Multi-modal Controllable Text-to-Speech Synthesis
  • - Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach
  • - Contextual Speech Extraction: Leveraging Textual History as an Implicit Cue for Target Speech Extraction
  • - Large Language Models are Strong Audio-Visual Speech Recognition Learners
Research Experience
  • - Google DeepMind, Tokyo, Japan (May 2025 - Present)
  • Position: Research Scientist
  • - Meta, London, UK (May 2024 - Mar. 2025)
  • Position: Postdoctoral Researcher
  • - Carnegie Mellon University (CMU), Pittsburgh, USA (Aug. 2023 - Oct. 2023)
  • Position: Visiting Scholar
  • Mentor: Prof. Shinji Watanabe
Education
  • - KAIST, Daejeon, Korea (Feb. 2019 - Feb. 2024)
  • Degree: Ph.D. in Electrical Engineering
  • Advisor: Prof. Yong Man Ro
  • - Yonsei Univ., Seoul, Korea (Feb. 2013 - Feb. 2019)
  • Degree: B.S. in Electrical and Electronic Engineering
  • Graduated with High Honors / Early Graduation (1 year)
Background
  • Research Interests: Multi-modal AI, Gen AI, Audio-Visual Speech Processing
  • Position: Research Scientist at Google DeepMind