Scholar

Zhikang Niu

Google Scholar ID: mXSpi2kAAAAJ

Shanghai Jiao Tong University

Speech Synthesis

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

272

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailzhikangniu@sjtu.edu.cn CVOpen ↗GitHubOpen ↗

Publications

20 items

MMAE: A Massive Multitask Audio Editing Benchmark

2026

Cited

WavTTS: Towards High-Quality Zero-Shot TTS via Direct Raw Waveform Modeling

2026

Cited

WavCube: Unifying Speech Representation for Understanding and Generation via Semantic-Acoustic Joint Modeling

2026

Cited

X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning

2026

Cited

Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis

2026

Cited

SLAM-LLM: A Modular, Open-Source Multimodal Large Language Model Framework and Best Practice for Speech, Language, Audio and Music Processing

IEEE Journal on Selected Topics in Signal Processing · 2026

Cited

Task Vector in TTS: Toward Emotionally Expressive Dialectal Speech Synthesis

2025

Cited

STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence

2025

Cited

Resume (English only)

Academic Achievements

Publications: NDVQ: Robust Neural Audio Codec with Normal Distribution-Based Vector Quantization (SLT 2024); Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis; Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment (InterSpeech 2025 Oral); F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching (ACL 2025 Main), the code has collected over 13,000 stars on GitHub; and other research works.

Research Experience

2025.07–present, Research Intern, Minimax Speech Team; 2025.01–2025.06, Research Intern, Shanghai Artificial Intelligence Laboratory; 2023.08–2024.09, Research Intern, Natural Language Computing Group (NLC), Microsoft Research Asia (MSRA), led by Furu Wei, supervised by Shujie Liu and Long Zhou, focusing on Audio Codec and Speech Synthesis.

Education

2020.09–2024.06, Bachelor of Engineering, School of Artificial Intelligence, Xidian University.

Background

Research Interests: Audio Signal Processing, Audio Codec Model, Multimodal Large Language Model, Machine Learning, and Deep Learning. Supervised by Prof. Xie Chen.

Miscellany