Scholar

Joanna Hong

Google Scholar ID: wqvP0D8AAAAJ

Google DeepMind

Audio ProcessingSpeech ProcessingLarge Language ModelMultimodal

Citations & Impact

All-time

Citations

522

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

1 items

2025

Cited

Resume (English only)

Academic Achievements

Conference Papers:
1. Let's Go Real Talk: Spoken Dialogue Model for Face-to-Face Conversation, ACL 2024 (Oral)
2. Intuitive Multilingual Audio-Visual Speech Recognition with a Single-Trained Model, EMNLP 2023
3. DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding, ICCV 2023
4. Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring, CVPR 2023
5. Lip-to-Speech Synthesis in the Wild with Multi-task Learning, ICASSP 2023
6. VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection, ECCV 2022
7. Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition, Interspeech 2022 (Oral)
8. SyncTalkFace: Talking Face Generation with Precise Lip-syncing via Audio-Lip Memory, AAAI 2022 (Oral)
9. Lip to Speech Synthesis with Visual Context Attentional GAN, NeuIPS 2021
10. Multi-Modality Associative Bridging Through Memory: Speech Sound Recollected From Face Video, ICCV 2021

Research Experience

1. Research Scientist at Google DeepMind, 2025 - present, advancing speech and audio capabilities for Audio Gemini.
2. Member of Technical Staff at Trillion Labs, 2024 - 2025, contributed to the development of Trillion-7B and Tri-21B model, a multilingual large language model designed for practical, real-world applications.
3. Research Scientist Intern at Meta Reality Labs, 2023 - 2024, worked on robust audiovisual representation learning with missing modality scenarios, enabling the recovery of absent information when only a single modality (e.g., audio or video) is available.

Education

Ph.D. in Electrical Engineering from KAIST, advised by Professor Yong Man Ro in the Integrated Vision Language Lab. Her thesis focused on human speech understanding through multimodal representation learning and was recognized with the Outstanding Dissertation Award from the School of Electrical Engineering.

Background

Research Interests: Building robust and scalable speech and audio technologies for human-AI interaction, including speech enhancement, separation, and speaker diarization. Also interested in multimodal learning that integrates audio, visual, and textual modalities to improve machine understanding.

Miscellany