Scholar

Wangyou Zhang

Google Scholar ID: LQ_ApskAAAAJ

Assistant Professor, School of Artificial Intelligence, Shanghai Jiao Tong University

Speech Separation and EnhancementRobust Speech RecognitionSpeech Representation Learning

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

2,537

H-index

i10-index

Publications

Co-authors

list available

Contact

GitHubOpen ↗

Publications

15 items

TF-MoE: Time-Frequency Mixture-of-Experts for Efficient Speech Separation

2026

Cited

ESPnet3: Infrastructure for Scalable Speech and Audio Research in the Foundation Model Era

2026

Cited

On the Distillation Loss Functions of Speech VAE for Unified Reconstruction, Understanding, and Generation

2026

Cited

Representation-Regularized Convolutional Audio Transformer for Audio Understanding

2026

Cited

UrgentMOS: Unified Multi-Metric and Preference Learning for Robust Speech Quality Assessment

2026

Cited

ICASSP 2026 URGENT Speech Enhancement Challenge

2026

Cited

PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning

2025

Cited

Less is More: Data Curation Matters in Scaling Speech Enhancement

2025

Cited

Resume (English only)

Academic Achievements

Published 52 papers in top-tier speech conferences and journals, such as TASLP, SPM, ICASSP, and Interspeech. Recipient of the ASRU 2019 Best Paper Award, EMNLP 2024 Best Paper Award, and MSRA 2021 Fellowship. Serves as a regular reviewer for top-tier conferences and journals, including TASLP, SPL, Speech Communication, ICASSP, and Interspeech. Received the Best Reviewer Award at ASRU 2023. Actively contributes to open-source projects, especially ESPnet (one of the most popular end-to-end speech processing toolkits), where he serves as one of the core maintainers.

Research Experience

Member of the AudioCC Lab led by Prof. Yanmin Qian.

Education

Received his Ph.D. degree from Shanghai Jiao Tong University in 2024 and his B.Sc. degree from Huazhong University of Science and Technology in 2018. Advisor: Prof. Yanmin Qian.

Background

Research interests: Speech signal processing in complex scenarios, including speech enhancement, separation, recognition, and self-supervised speech pre-training. Also highly interested in anything that helps to better understand AI.

Miscellany