Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
He has published influential research papers with 15000+ citations, including two best papers and several top cited papers at AI conferences. Many technologies he developed have been deployed in products such as Kimi-Video/Kimi-TTS, neural machine translation, pre-training models (MASS, MPNet), TTS (FastSpeech 1/2), ASR (FastCorrect 1/2), and AI Music. He and his team have several open-source projects on GitHub with over 30K stars, like HuggingGPT/JARVIS, Kimi-Audio, MASS, MPNet, and Muzic. He serves as an Action Editor of Transactions on Machine Learning Research (TMLR), an Area Chair or Meta Reviewer of NeurIPS/ICML/AAAI/ICASSP, a senior member of IEEE, and a member of the standing committee on Computational Art in China Computer Federation (CCF).
Research Experience
He is currently the Research VP of Multimodality at Moonshot AI (a.k.a Kimi). Previously, he designed several models/systems on video (e.g., Kimi-Video, LanDiff, GAIA), audio (e.g., Kimi-Audio, FastSpeech 1/2, NaturalSpeech 1/2/3, Muzic), language (e.g., MASS, MPNet), and AI agent (e.g., HuggingGPT).
Background
His work area covers LLMs, multimodality, and generative AI for video and audio. He was previously a Principal Research Manager at Machine Learning Group, Microsoft Research Asia (MSRA).