Haohan Guo
Scholar

Haohan Guo

Google Scholar ID: B-ZmwcMAAAAJ
Chinese University of Hong Kong
Speech SynthesisVoice ConversionSpeech Processing
Citations & Impact
All-time
Citations
824
 
H-index
13
 
i10-index
16
 
Publications
20
 
Co-authors
12
list available
Resume (English only)
Academic Achievements
  • Journal paper 'MSMC-TTS: Multi-Stage Multi-Codebook VQ-VAE based Neural TTS' accepted by IEEE/ACM TASLP.
  • Conference paper 'A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS' accepted by INTERSPEECH 2022.
  • Conference paper 'A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS' accepted by INTERSPEECH 2022.
  • Conference paper 'Improving Adversarial Waveform Generation based Singing Voice Conversion with Harmonic Signals' accepted by ICASSP 2022.
  • Conference paper 'Conversational End-to-End TTS for Voice Agents' accepted by SLT 2020.
  • Conference paper 'A New GAN-based End-to-End TTS Training Algorithm' accepted by INTERSPEECH 2019.
  • Conference paper 'Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS' accepted by INTERSPEECH 2019.
  • Preprint 'Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations' submitted to arXiv.
  • Preprint 'Phonetic Posteriorgrams based Many-to-Many Singing Voice Conversion via Adversarial Training' submitted to arXiv.
Research Experience
  • Applied Scientist Intern, Amazon (Cambridge, UK), Jun 2023 – Nov 2023: Developed large-scale TTS systems based on large language models (LLMs).
  • Research Intern, Xiaohongshu (Beijing, China), Aug 2020 – May 2022: Investigated applications of speech representations in TTS.
  • Researcher, Sogou Inc. (Beijing, China), Dec 2020 – Jul 2021: Worked on singing voice conversion aiming to build a commercial system with high sound quality and accurate melody expression.
  • Research Intern, Tencent AI Lab (Beijing, China), May 2020 – Dec 2020: Researched multi-singer singing voice conversion; proposed a MelGAN-based end-to-end PPG-SVC model.
  • Research Intern, Microsoft Research Asia & Microsoft STCA (Beijing, China), May 2018 – Sep 2019: Supervised by Frank K. Soong and Lei He; focused on improving robustness and naturalness of end-to-end TTS.