Released GVM - Gradient Variance Minimization, a framework to improve data sampling efficiency in LLMs math reasoning.
Wrote a report analyzing what makes GRPO 'stand out' for math reasoning, with some understanding and ablation studies to compare different algorithms for LLMs reasoning training.
Released FANS - Formal Answer Selection for Natural Language Reasoning Using Lean4, enhancing test-time math answer selection using formal language.
Published the paper 'Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL'.
Published the paper 'A Minimalist Approach to LLM Reasoning: from Rejection Sampling to Reinforce'.
Published the paper 'FANS – Formal Answer Selection for Natural Language Math Reasoning Using Lean4'.
Research Experience
Started the PhD journey in the CS School of UIUC in August 2024.
Education
First-year CS PhD student in the Siebel School of Computing and Data Science, University of Illinois, Urbana-Champaign (UIUC), supervised by Prof. Tong Zhang; Bachelor of Engineering from Yao Class, Tsinghua University.
Background
Main research interests focus on reinforcement learning, large language models, especially autonomous agents learning, model reasoning, and interdisciplinary fields.
Miscellany
Blog includes a Japan travel journal with photo collections.