Jeong Hun Yeo
Scholar

Jeong Hun Yeo

Google Scholar ID: PJoYv2cAAAAJ
Korea Advanced Institute of Science and Technology
Audio-Visual Speech RecognitionMultimodal Learning
Citations & Impact
All-time
Citations
273
 
H-index
8
 
i10-index
7
 
Publications
13
 
Co-authors
11
list available
Resume (English only)
Academic Achievements
  • AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model, IEEE Transactions on Multimedia (TMM), 2024
  • Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations, IEEE/CVF International Conference on Computer Vision (ICCV), 2025
  • MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens, Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL Findings), 2025
  • Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language, The Association for the Advancement of Artificial Intelligence (AAAI) 2025
  • Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing, Empirical Methods in Natural Language Processing (EMNLP) 2024 Findings
  • Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation, The Association for Computing Machinery's Annual Conference on Multimedia (ACMMM), 2024
  • Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation, Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) Oral Presentation, 2024
  • Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
  • Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
  • Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge, IEEE/CVF International Conference on Computer Vision (ICCV), 2023
  • Multi-Temporal Lip-Audio Memory for Visual Speech Recognition, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Background
  • Ph.D. Candidate at KAIST Integrated Vision & Language Lab, with research interests including audio knowledge empowered visual speech recognition, zero-shot audio-visual speech recognition, etc.
Miscellany
  • Contact: sedne246@kaist.ac.kr; Google Scholar and LinkedIn profiles available.