Dongchao Yang
Scholar

Dongchao Yang

Google Scholar ID: WNiojyAAAAAJ
Chinese University of Hong Kong
TTSTTAAudio CodecMulti-modal Audio Fundation Models
Citations & Impact
All-time
Citations
2,696
 
H-index
19
 
i10-index
27
 
Publications
20
 
Co-authors
16
list available
Resume (English only)
Academic Achievements
  • 1. Diffsound: Discrete Diffusion Model for Text-to-sound generation, IEEE Transactions on Audio, Speech and Language Processing, 2023.
  • 2. InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt, IEEE Transactions on Audio, Speech and Language Processing, 2024.
  • 3. UniAudio: An Audio Foundation Model Toward Universal Audio Generation, ICML, 2024.
  • 4. UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner, NIPS, 2024.
  • 5. ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling, ICML, 2025.
  • 6. SimpleSpeech 2: Towards Simple and Efficient Text-to-Speech with Flow-based Scalar Latent Transformer Diffusion Models, IEEE Transactions on Audio, Speech and Language Processing (TASLP), 2025.
  • 7. SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models, Proc. Interspeech, 2024.
  • 8. Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models, ICML, 2023.
  • 9. A Mixed Supervised Learning Framework for Target Sound Detection, DCASE Workshop, 2022.
  • 10. AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head, AAAI 2024, 2023.
Research Experience
  • 1. July 2023 - Sep. 2023, MSRA, Speech Group, Intern, Supervisor: Xu Tan.
  • 2. May 2021 - May 2023, Tencent AI Lab, Speech Group, Intern, Supervisors: Songxiang Liu, Chao Weng, and Bo Wu.
Education
  • 1. The Chinese University of Hongkong, School of Electronic and Computer Engineering, PhD in progress, August 2023 - Now.
  • 2. Peking University, School of Computer Engineering and Science, Master, August 2020 - August 2023.
  • 3. Shanghai University, August 2016 - July 2020.
Background
  • Research interests include Audio Foundation Models, Generative Models, Large Language Models, and Audio/Speech Processing. Currently a PhD student at The Chinese University of Hongkong, supervised by Prof. Helen Meng. Received the Master's Degree from Peking University in 2023.
Miscellany
  • Actively looking for collaboration opportunities, e.g., Audio Foundation Models, Generative Models, TTS, Text-to-audio.