Papers: 'The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models' etc.; Projects: Released Implicit Process Reward Modeling (ImplicitPRM) and Process Reinforcement through Implicit Rewards (PRIME), introduced Test Time Reinforcement Learning (TTRL) and the Entropy Mechanism.
Research Experience
Current Position: Assistant Professor @ Tsinghua; Formerly: Postdoc researcher in the same department, advised by Prof. Bowen Zhou.
Education
Ph.D. in 2023 from the Department of Computer Science and Technology, Tsinghua University, advised by Prof. Hai-Tao Zheng and also co-advised by Prof. Zhiyuan Liu.
Background
Research Interests: Natural language processing and machine learning. Bio: Tenure-track Assistant Professor at the Department of Electronic Engineering, Tsinghua University. Previously, a postdoc researcher in the same department.
Miscellany
Lab Information: Looking for self-motivated Ph.D. students, Postdocs, and interns. Research topics include but are not limited to scalable reinforcement learning, fundamental theories, scientific applications of reasoning language models.