Journal paper 'MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE based Neural TTS' accepted by IEEE/ACM TASLP.
Conference paper 'A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS' accepted by INTERSPEECH 2022.
Conference paper 'A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS' accepted by INTERSPEECH 2022.
Conference paper 'Improving Adversarial Waveform Generation based Singing Voice Conversion with Harmonic Signals' accepted by ICASSP 2022.
Conference paper 'Conversational End-to-End TTS for Voice Agents' accepted by SLT 2020.
Conference paper 'A New GAN-based End-to-End TTS Training Algorithm' accepted by INTERSPEECH 2019.
Conference paper 'Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS' accepted by INTERSPEECH 2019.
Preprint 'Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations' submitted to arXiv.
Preprint 'Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training' submitted to arXiv.
Research Experience
Applied Scientist Intern, Amazon (Cambridge, UK), Jun 2023 – Nov 2023: Developed large-scale TTS systems based on large language models (LLMs).
Research Intern, Xiaohongshu (Beijing, China), Aug 2020 – May 2022: Investigated applications of speech representations in TTS.
Researcher, Sogou Inc. (Beijing, China), Dec 2020 – Jul 2021: Worked on singing voice conversion aiming to build a commercial system with high sound quality and accurate melody expression.
Research Intern, Tencent AI Lab (Beijing, China), May 2020 – Dec 2020: Researched multi-singer singing voice conversion; proposed a MelGAN-based end-to-end PPG-SVC model.
Research Intern, Microsoft Research Asia & Microsoft STCA (Beijing, China), May 2018 – Sep 2019: Supervised by Frank K. Soong and Lei He; focused on improving robustness and naturalness of end-to-end TTS.