Released CritPT benchmark, revealed skill composition learning in RL, discovered the impact of entropy collapse on RL scaling, proposed minimizing entropy to squeeze LLM capabilities, published Eurus paper, introduced PRIME solution, released Implicit PRM, and more. Involved in multiple research papers including 'From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones', 'Process Reinforcement through Implicit Rewards', etc.
Research Experience
Before joining UIUC, collaborated with THUNLP and Prof. Heng Ji on research projects.
Education
PhD student at the University of Illinois Urbana-Champaign (starting Fall 2024), advised by Prof. Hao Peng. Previously worked with Prof. Zhiyuan Liu at THUNLP and Prof. Heng Ji at UIUC.
Background
Research interests include automating AI research through self-evolution or scalable oversight, and advancing science. Specific research directions include: (1) Scalable data synthesis to support continuous scaling of compute to improve LLMs; (2) Scalable evaluation methods to unlock and amplify LLMs' ability to provide feedback; (3) Scalable training algorithms that incorporate such feedback to enhance LLMs.