Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
Published papers at top-tier venues including TMLR, ICML, ICLR, ACL, EMNLP, NAACL, AAAI, ICASSP, INTERSPEECH, etc. Recipient of the INTERSPEECH 2025 Best Student Paper Award (first-authored), EMNLP 2024 Best Paper Award, IEEE SLT 2024 Best Paper Award, ICASSP 2023 Top 3% Paper Recognition (two first-authored and one co-authored), and SPIE Medical Imaging 2020 Best Student Paper Award Finalist (first-authored).
Research Experience
Interned as a Research Scientist at NVIDIA NeMo (Summer 2024), Meta FAIR (Summer 2023), and ASAPP (Summer 2022), conducting research on speech language models and speech recognition. Led the Open Whisper-style Speech Models (OWSM) project at CMU WAVLab and was a core contributor to the widely-used speech processing toolkit, ESPnet.
Education
Ph.D. in Electrical and Computer Engineering from Carnegie Mellon University, graduating in 2025, supervised by Prof. Shinji Watanabe (Sep 2021 - May 2025) and Prof. Ian Lane (Aug 2020 - Aug 2021; now at UC, Santa Cruz). Bachelor’s degree from the Department of Electronic Engineering at Tsinghua University, graduated in 2020.
Background
Research interests: Building open multimodal foundation models, particularly for speech and language processing. Recent focus has been on multimodal large language models (LLMs) and full-duplex speech-to-speech dialog systems.
Miscellany
Currently working as a Research Scientist at NVIDIA.