Scholar

Xu Tan

Google Scholar ID: tob-U1oAAAAJ

Principal Researcher and Research Manager, Microsoft

Large Language ModelsMultimodalityAvatar/Video GenerationSpeech/Music Generation

Citations & Impact

All-time

Citations

18,589

H-index

i10-index

149

Publications

Co-authors

list available

Contact

Publications

20 items

Browse publications on Google Scholar (top-right) ↗

Resume (English only)

Academic Achievements

He has published influential research papers with 15000+ citations, including two best papers and several top cited papers at AI conferences. Many technologies he developed have been deployed in products such as Kimi-Video/Kimi-TTS, neural machine translation, pre-training models (MASS, MPNet), TTS (FastSpeech 1/2), ASR (FastCorrect 1/2), and AI Music. He and his team have several open-source projects on GitHub with over 30K stars, like HuggingGPT/JARVIS, Kimi-Audio, MASS, MPNet, and Muzic. He serves as an Action Editor of Transactions on Machine Learning Research (TMLR), an Area Chair or Meta Reviewer of NeurIPS/ICML/AAAI/ICASSP, a senior member of IEEE, and a member of the standing committee on Computational Art in China Computer Federation (CCF).

Research Experience

He is currently the Research VP of Multimodality at Moonshot AI (a.k.a Kimi). Previously, he designed several models/systems on video (e.g., Kimi-Video, LanDiff, GAIA), audio (e.g., Kimi-Audio, FastSpeech 1/2, NaturalSpeech 1/2/3, Muzic), language (e.g., MASS, MPNet), and AI agent (e.g., HuggingGPT).

Background

His work area covers LLMs, multimodality, and generative AI for video and audio. He was previously a Principal Research Manager at Machine Learning Group, Microsoft Research Asia (MSRA).

Co-authors

19 total