Scholar

Yihan Wu

Google Scholar ID: P2K_kOUAAAAJ

Renmin University of China

Speech synthesisspeech generationmultimodal learning

Citations & Impact

All-time

Citations

442

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

1 items

2026

Cited

Resume (English only)

Academic Achievements

Selected papers:
- “Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization”, accepted by AAAI 2025;
- “Robust Audiovisual Speech Recognition Models with Mixture-of-Experts”, SLT 2024;
- “Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech”, SLT 2024;
- “Tiva: Time-aligned video-to-audio generation”, ACM MM 2024;
- “VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing”, AAAI 2023;
- “Adaspeech 4: Adaptive text to speech in zero-shot scenarios”, INTERSPEECH 2022;
- “Self-supervised context-aware style representation for expressive speech synthesis”, INTERSPEECH 2022.

Research Experience

Visiting Scholar: Language Technologies Institute, Carnegie Mellon University (Sep. 2023 - Sep. 2024), worked with Prof. Shinji Watanabe;
Research Intern: Microsoft Research Asia (Oct. 2021 - Oct. 2022), worked with Xu Tan;
Research Intern: Microsoft C+AI, Speech Team (May 2021 - Oct. 2021), worked with Xi Wang and Lei He.

Education

Earned B.S. degree from Shandong University in 2021; currently a fourth-year Ph.D. student at the Gaoling School of Artificial Intelligence, Renmin University of China, under the supervision of Prof. Ruihua Song.

Background

Broadly interested in speech related researches, including speech synthesis, speech recognition, and speech language models. Currently a fourth-year Ph.D. student at the Gaoling School of Artificial Intelligence, Renmin University of China.

Miscellany

Co-authors

4 total