Scholar

Songxiang Liu

Google Scholar ID: 4fD1l28AAAAJ

Meituan multi-modal team, PhD (The Chinese University of Hong Kong)

Multi-ModalLLMAudio foundation modelSpeech synthesis

Citations & Impact

All-time

Citations

2,066

H-index

i10-index

Publications

Co-authors

Contact

Publications

7 items

2026

Cited

2026

Cited

arXiv.org · 2026

Cited

2025

Cited

2025

Cited

2025

Cited

2025

Cited

Resume (English only)

Academic Achievements

2025: 'ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling' accepted by ICML 2025
Apr 2025: Released technical report of Kimi-Audio (with code, model, and paper)
2024: 'UniAudio: Towards Universal Audio Generation with Large Language Models' accepted by ICML 2024
2024: 'InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt' accepted by IEEE/ACM TASLP (corresponding author)
2023: Released InstructTTS on expressive TTS with natural language prompts (arXiv)
2022: Published DiffGAN-TTS paper (arXiv)
2021: DiffSVC paper accepted by ASRU 2021
2021: Published singing voice conversion work using denoising diffusion probabilistic models (DDPM) (arXiv)
2021: FastSVC paper accepted as an oral presentation at ICME 2021
Multiple IEEE/ACM TASLP journal papers on voice conversion, emotive speech synthesis, and speech emotion recognition

Co-authors

0 total

Co-authors: 0 (list not available)