August 2024: Introduced rStar, a self-play mutual reasoning approach that significantly boosts reasoning capabilities of Small Language Models (SLMs) during inference—e.g., improved GSM8K accuracy of LLaMA2-7B from 12.51% to 63.91%
rStar has been recommended as a key technique in OAI-o1-like approaches
August 2024: Contributed to the release of Phi3.5-128k LLMs with significant improvements in LongRoPE for recovering short-context performance after context window extension
July 2024: Released LongRoPE-related work
Work featured on Hugging Face Daily Papers and Jiqizhixin (Machine Heart)
Background
Currently a Principal Researcher in the Systems and Networking Group at Microsoft Research Asia (MSRA)
Broad research interests in AI algorithms
Since joining MSRA, focused on novel algorithms for improving AI inference efficiency, including: (1) compression for pre-trained Transformer models and LLMs; (2) hardware-aware Neural Architecture Search (NAS) for edge AI
Recently deeply engaged in exploring cutting-edge research problems in Large Language Models (LLMs) and Artificial General Intelligence (AGI)
Actively working on topics such as long-context LLMs and LLM self-play