Yihan Wu
Scholar

Yihan Wu

Google Scholar ID: P2K_kOUAAAAJ
Renmin University of China
Speech synthesisspeech generationmultimodal learning
Citations & Impact
All-time
Citations
442
 
H-index
9
 
i10-index
9
 
Publications
20
 
Co-authors
4
list available
Resume (English only)
Academic Achievements
  • Selected papers:
  • - “Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization”, accepted by AAAI 2025;
  • - “Robust Audiovisual Speech Recognition Models with Mixture-of-Experts”, SLT 2024;
  • - “Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech”, SLT 2024;
  • - “Tiva: Time-aligned video-to-audio generation”, ACM MM 2024;
  • - “VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing”, AAAI 2023;
  • - “Adaspeech 4: Adaptive text to speech in zero-shot scenarios”, INTERSPEECH 2022;
  • - “Self-supervised context-aware style representation for expressive speech synthesis”, INTERSPEECH 2022.
Research Experience
  • Visiting Scholar: Language Technologies Institute, Carnegie Mellon University (Sep. 2023 - Sep. 2024), worked with Prof. Shinji Watanabe;
  • Research Intern: Microsoft Research Asia (Oct. 2021 - Oct. 2022), worked with Xu Tan;
  • Research Intern: Microsoft C+AI, Speech Team (May 2021 - Oct. 2021), worked with Xi Wang and Lei He.
Education
  • Earned B.S. degree from Shandong University in 2021; currently a fourth-year Ph.D. student at the Gaoling School of Artificial Intelligence, Renmin University of China, under the supervision of Prof. Ruihua Song.
Background
  • Broadly interested in speech related researches, including speech synthesis, speech recognition, and speech language models. Currently a fourth-year Ph.D. student at the Gaoling School of Artificial Intelligence, Renmin University of China.
Miscellany
  • Currently looking for summer internships; also on the job market.