Publications: NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization (SLT 2024); Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis; Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment (InterSpeech 2025 Oral); F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching (ACL 2025 Main), the code has collected over 13,000 stars on GitHub; and other research works.
Research Experience
2025.07–present, Research Intern, Minimax Speech Team; 2025.01–2025.06, Research Intern, Shanghai Artificial Intelligence Laboratory; 2023.08–2024.09, Research Intern, Natural Language Computing Group (NLC), Microsoft Research Asia (MSRA), led by Furu Wei, supervised by Shujie Liu and Long Zhou, focusing on Audio Codec and Speech Synthesis.
Education
2020.09–2024.06, Bachelor of Engineering, School of Artificial Intelligence, Xidian University.
Background
Research Interests: Audio Signal Processing, Audio Codec Model, Multimodal Large Language Model, Machine Learning, and Deep Learning. Supervised by Prof. Xie Chen.
Miscellany
Contact: zhikangniu@sjtu.edu.cn; Social Media/Personal Links: GitHub, Google Scholar, WeChat, CV