Published multiple papers in the field of audio understanding and generation. Specific projects include: enhanced accuracy, diversity, temporal accuracy, and efficiency in audio captioning; task and weakly-supervised training paradigm for text to audio grounding; BLAT, Auto-ACD, detailed audio-text simulation; visually-enhanced diverse generation; PicoAudio with a temporal-sensitive evaluation benchmark; Audio Codec for Audio LLM (SemantiCodec); content creation with LLM agent, e.g., AI storytelling for children.
Research Experience
Mainly focuses on general audio understanding and generation, including tasks such as audio captioning, text to audio grounding, audio-text retrieval, and text to audio generation. Also interested in speech/music understanding and generation and their interaction with general audio.
Education
2019.9 - 2025.6, Ph.D., Shanghai Jiao Tong University, supervised by Prof. Mengyue Wu and Prof. Kai Yu; 2023.10 - 2024.4, visiting Ph.D., University of Surrey, supervised by Prof. Mark D. Plumbley and Prof. Wenwu Wang; 2015.9 - 2019.6, Bachelor, Shanghai Jiao Tong University, supervised by Leyun Wang.
Background
A fourth year Ph.D. candidate from X-LANCE Lab, Shanghai Jiao Tong University, supervised by Prof. Mengyue Wu and Prof. Kai Yu. Research interests include audio/speech/music understanding and generation, and large language models.
Miscellany
Expected to graduate in June 2025 and open to job opportunities in 2025. Can be contacted via LinkedIn or WeChat.