2025: 'ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling' accepted by ICML 2025
Apr 2025: Released technical report of Kimi-Audio (with code, model, and paper)
2024: 'UniAudio: Towards Universal Audio Generation with Large Language Models' accepted by ICML 2024
2024: 'InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt' accepted by IEEE/ACM TASLP (corresponding author)
2023: Released InstructTTS on expressive TTS with natural language prompts (arXiv)
2022: Published DiffGAN-TTS paper (arXiv)
2021: DiffSVC paper accepted by ASRU 2021
2021: Published singing voice conversion work using denoising diffusion probabilistic models (DDPM) (arXiv)
2021: FastSVC paper accepted as an oral presentation at ICME 2021
Multiple IEEE/ACM TASLP journal papers on voice conversion, emotive speech synthesis, and speech emotion recognition