Published papers at top-tier conferences including NeurIPS, ICML, ICLR, ACL, IEEE SLT, and IEEE TASLP
NaturalSpeech 3 ('Factorized Diffusion Models are Natural and Zero-shot Speech Synthesizers') accepted as an Oral presentation at ICML 2024
MaskGCT accepted at ICLR 2025
'Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment' accepted at ACL 2025 main conference
'Metis: A Foundation Speech Generation Model with Masked Generative Pre-training' and 'TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling' accepted at NeurIPS 2025
'SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words' accepted at NeurIPS 2024
'Amphion' and 'Emila' accepted at IEEE SLT 2024
'AUDIT: Audio Editing by Following Instructions with Latent Diffusion Models' accepted at NeurIPS 2023
TaDiCodec received Honorable Mention Award at Nanyang Speech Technology Forum (NYSF) 2025
Paper 'Noro: Noise-Robust One-shot Voice Conversion with Hidden Speaker Representation Learning' was a Best Paper Finalist at APSIPA 2025