Scholar

Lifan Yuan

Google Scholar ID: WPAuAOgAAAAJ

University of Illinois Urbana-Champaign

Natural Language ProcessingMachine Learning

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

2,310

H-index

i10-index

Publications

Co-authors

list available

Contact

Emaillifan4@illinois.edu TwitterOpen ↗GitHubOpen ↗

Publications

10 items

Do We Need Adam? Surprisingly Strong and Sparse Reinforcement Learning with SGD in LLMs

2026

Cited

RLVE: Scaling Up Reinforcement Learning for Language Models with Adaptive Verifiable Environments

2025

Cited

Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark

2025

Cited

From $f(x)$ and $g(x)$ to $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones

2025

Cited

RLPR: Extrapolating RLVR to General Domains without Verifiers

2025

Cited

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

2025

Cited

The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

2025

Cited

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

2025

Cited

Resume (English only)

Academic Achievements

Released CritPT benchmark, revealed skill composition learning in RL, discovered the impact of entropy collapse on RL scaling, proposed minimizing entropy to squeeze LLM capabilities, published Eurus paper, introduced PRIME solution, released Implicit PRM, and more. Involved in multiple research papers including 'From f(x) and g(x) to f(g(x)): LLMs Learn New Skills in RL by Composing Old Ones', 'Process Reinforcement through Implicit Rewards', etc.

Research Experience

Before joining UIUC, collaborated with THUNLP and Prof. Heng Ji on research projects.

Education

PhD student at the University of Illinois Urbana-Champaign (starting Fall 2024), advised by Prof. Hao Peng. Previously worked with Prof. Zhiyuan Liu at THUNLP and Prof. Heng Ji at UIUC.

Background

Research interests include automating AI research through self-evolution or scalable oversight, and advancing science. Specific research directions include: (1) Scalable data synthesis to support continuous scaling of compute to improve LLMs; (2) Scalable evaluation methods to unlock and amplify LLMs' ability to provide feedback; (3) Scalable training algorithms that incorporate such feedback to enhance LLMs.

Co-authors

8 total