Scholar

Yifan Peng

Google Scholar ID: wH2FALMAAAAJ

NVIDIA

Multimodal LLMsSpeech-to-SpeechLarge Language ModelsSpeech Recognition

Citations & Impact

All-time

Citations

1,524

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

20 items

Browse publications on Google Scholar (top-right) ↗

Resume (English only)

Academic Achievements

Published papers at top-tier venues including TMLR, ICML, ICLR, ACL, EMNLP, NAACL, AAAI, ICASSP, INTERSPEECH, etc. Recipient of the INTERSPEECH 2025 Best Student Paper Award (first-authored), EMNLP 2024 Best Paper Award, IEEE SLT 2024 Best Paper Award, ICASSP 2023 Top 3% Paper Recognition (two first-authored and one co-authored), and SPIE Medical Imaging 2020 Best Student Paper Award Finalist (first-authored).

Research Experience

Interned as a Research Scientist at NVIDIA NeMo (Summer 2024), Meta FAIR (Summer 2023), and ASAPP (Summer 2022), conducting research on speech language models and speech recognition. Led the Open Whisper-style Speech Models (OWSM) project at CMU WAVLab and was a core contributor to the widely-used speech processing toolkit, ESPnet.

Education

Ph.D. in Electrical and Computer Engineering from Carnegie Mellon University, graduating in 2025, supervised by Prof. Shinji Watanabe (Sep 2021 - May 2025) and Prof. Ian Lane (Aug 2020 - Aug 2021; now at UC, Santa Cruz). Bachelor’s degree from the Department of Electronic Engineering at Tsinghua University, graduated in 2020.

Background

Research interests: Building open multimodal foundation models, particularly for speech and language processing. Recent focus has been on multimodal large language models (LLMs) and full-duplex speech-to-speech dialog systems.

Miscellany