8. SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memory, AAAI 2022 (Oral)
9. Lip to Speech Synthesis with Visual Context Attentional GAN, NeuIPS 2021
10. Multi-Modality Associative Bridging Through Memory: Speech Sound Recollected From Face Video, ICCV 2021
Research Experience
1. Research Scientist at Google DeepMind, 2025 - present, advancing speech and audio capabilities for Audio Gemini.
2. Member of Technical Staff at Trillion Labs, 2024 - 2025, contributed to the development of Trillion-7B and Tri-21B model, a multilingual large language model designed for practical, real-world applications.
3. Research Scientist Intern at Meta Reality Labs, 2023 - 2024, worked on robust audiovisual representation learning with missing modality scenarios, enabling the recovery of absent information when only a single modality (e.g., audio or video) is available.
Education
Ph.D. in Electrical Engineering from KAIST, advised by Professor Yong Man Ro in the Integrated Vision Language Lab. Her thesis focused on human speech understanding through multimodal representation learning and was recognized with the Outstanding Dissertation Award from the School of Electrical Engineering.
Background
Research Interests: Building robust and scalable speech and audio technologies for human-AI interaction, including speech enhancement, separation, and speaker diarization. Also interested in multimodal learning that integrates audio, visual, and textual modalities to improve machine understanding.