AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model, IEEE Transactions on Multimedia (TMM), 2024
Zero-AVSR: Zero-Shot Audio-Visual Speech Recognition with LLMs by Learning Language-Agnostic Speech Representations, IEEE/CVF International Conference on Computer Vision (ICCV), 2025
MMS-LLaMA: Efficient LLM-based Audio-Visual Speech Recognition with Minimal Multimodal Speech Tokens, Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL Findings), 2025
Personalized Lip Reading: Adapting to Your Unique Lip Movements with Vision and Language, The Association for the Advancement of Artificial Intelligence (AAAI) 2025
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing, Empirical Methods in Natural Language Processing (EMNLP) 2024 Findings
Efficient Training for Multilingual Visual Speech Recognition: Pre-training with Discretized Visual Speech Representation, The Association for Computing Machinery's Annual Conference on Multimedia (ACMMM), 2024
Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation, Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL) Oral Presentation, 2024
Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge, IEEE/CVF International Conference on Computer Vision (ICCV), 2023
Multi-Temporal Lip-Audio Memory for Visual Speech Recognition, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Background
Ph.D. Candidate at KAIST Integrated Vision & Language Lab, with research interests including audio knowledge empowered visual speech recognition, zero-shot audio-visual speech recognition, etc.
Miscellany
Contact: sedne246@kaist.ac.kr; Google Scholar and LinkedIn profiles available.