Publications at top venues: ICLR 2025, ICML 2024, IEEE TSP, ICASSP 2023 (oral), ICRA 2022, INTERSPEECH 2021, ICASSP 2021
Preprints: 'Reward-Robust RLHF in LLMs', 'Uncertainty-aware Reward Model', etc.
Champion of Tencent AI Arena Multi-agent Reinforcement Learning Competition (2022, 2023)
First Prize (2nd place), ICRA RoboMaster University Sim2Real Challenge by DJI (2022)
3rd place, World University Math & Intelligence Competition (Chengdu FISU World University Games)
Research Experience
Feb 2025–present: Research Intern at Moonshot AI, working on general RL for multimodal LLMs and developing Kimi K-series models (e.g., Kimi-K2, Kimi-Dev-72B, Kimi-VL)
Aug 2023–Jan 2025: Research Intern in RLHF group at Baichuan AI, mentored by Dong Yan
Aug 2020–mid 2023: Research Intern at Machine Learning Group, Microsoft Research Asia (MSRA), mentored by Xu Tan, Tao Qin, and Tieyan Liu
Oct 2024–Mar 2025: 6-month visiting researcher at UIUC, hosted by Tamer Basar
Collaborating with Yu Wang on the Collaborative Intelligence Group
Working with Kaiqing Zhang and Tamer Basar on theoretical foundations of RL/MARL