Scholar
Minsu Kim
Google Scholar ID: TXB0FyoAAAAJ
Google DeepMind
Multimodal Learning
Audio-Visual Speech Processing
Generative AI
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
929
H-index
19
i10-index
22
Publications
20
Co-authors
28
list available
Contact
GitHub
Open ↗
LinkedIn
Open ↗
Publications
1 items
STRIDE: When to Speak Meets Sequence Denoising for Streaming Video Understanding
2026
Cited
0
Resume (English only)
Academic Achievements
International Journal Publications:
- TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
- Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition
- Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation
- AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model
- Cromm-vsr: Cross-modal memory augmented visual speech recognition
- Speech Reconstruction with Reminiscent Sound via Visual Voice Memory
International Conference Papers:
- MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition
- Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs
- Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations
- Revival with Voice: Multi-modal Controllable Text-to-Speech Synthesis
- Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach
- Contextual Speech Extraction: Leveraging Textual History as an Implicit Cue for Target Speech Extraction
- Large Language Models are Strong Audio-Visual Speech Recognition Learners
Research Experience
- Google DeepMind, Tokyo, Japan (May 2025 - Present)
Position: Research Scientist
- Meta, London, UK (May 2024 - Mar. 2025)
Position: Postdoctoral Researcher
- Carnegie Mellon University (CMU), Pittsburgh, USA (Aug. 2023 - Oct. 2023)
Position: Visiting Scholar
Mentor: Prof. Shinji Watanabe
Education
- KAIST, Daejeon, Korea (Feb. 2019 - Feb. 2024)
Degree: Ph.D. in Electrical Engineering
Advisor: Prof. Yong Man Ro
- Yonsei Univ., Seoul, Korea (Feb. 2013 - Feb. 2019)
Degree: B.S. in Electrical and Electronic Engineering
Graduated with High Honors / Early Graduation (1 year)
Background
Research Interests: Multi-modal AI, Gen AI, Audio-Visual Speech Processing
Position: Research Scientist at Google DeepMind
Co-authors
28 total
Yong Man Ro
Professor of Electrical Engineering, KAIST, ICT Endowed Chair Professor
Joanna Hong
Google DeepMind
Jeongsoo Choi
KAIST
Se Jin Park
Korea Advanced Institute of Science and Technology (KAIST)
Jeong Hun Yeo
Korea Advanced Institute of Science and Technology
Stavros Petridis
GenAI Research Director, NatWest Group / Research Fellow, Imperial College London
Shinji Watanabe
Carnegie Mellon University
Co-author 8
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up