Xu Tan
Scholar

Xu Tan

Google Scholar ID: tob-U1oAAAAJ
Principal Researcher and Research Manager, Microsoft
Large Language ModelsMultimodalityAvatar/Video GenerationSpeech/Music Generation
Citations & Impact
All-time
Citations
18,589
 
H-index
61
 
i10-index
149
 
Publications
20
 
Co-authors
19
list available
Publications
20 items
Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
  • He has published influential research papers with 15000+ citations, including two best papers and several top cited papers at AI conferences. Many technologies he developed have been deployed in products such as Kimi-Video/Kimi-TTS, neural machine translation, pre-training models (MASS, MPNet), TTS (FastSpeech 1/2), ASR (FastCorrect 1/2), and AI Music. He and his team have several open-source projects on GitHub with over 30K stars, like HuggingGPT/JARVIS, Kimi-Audio, MASS, MPNet, and Muzic. He serves as an Action Editor of Transactions on Machine Learning Research (TMLR), an Area Chair or Meta Reviewer of NeurIPS/ICML/AAAI/ICASSP, a senior member of IEEE, and a member of the standing committee on Computational Art in China Computer Federation (CCF).
Research Experience
  • He is currently the Research VP of Multimodality at Moonshot AI (a.k.a Kimi). Previously, he designed several models/systems on video (e.g., Kimi-Video, LanDiff, GAIA), audio (e.g., Kimi-Audio, FastSpeech 1/2, NaturalSpeech 1/2/3, Muzic), language (e.g., MASS, MPNet), and AI agent (e.g., HuggingGPT).
Background
  • His work area covers LLMs, multimodality, and generative AI for video and audio. He was previously a Principal Research Manager at Machine Learning Group, Microsoft Research Asia (MSRA).