P1: Mastering Physics Olympiads with Reinforcement Learning

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs) exhibit limited scientific reasoning capabilities on Olympiad-level physics problems, particularly in complex physical modeling, multi-step deductive reasoning, and systematic application of conservation laws. Method: We propose a pure reinforcement learning (RL) training paradigm and introduce PhysicsMinions—a lightweight, multi-agent framework enabling role-based collaborative reasoning and physics-knowledge-guided stepwise problem solving. Contribution/Results: Our released P1 series models (e.g., P1-235B-A22B) achieve the first open-source LLM gold medal at IPhO 2025. Across 13 international physics competitions, they attain 12 gold and 1 silver medals—ranking first in both aggregate score and average performance, significantly surpassing state-of-the-art baselines. This work establishes a reproducible, scalable paradigm for scientific AI in physics reasoning.

Technology Category

Application Category

📝 Abstract
Recent progress in large language models (LLMs) has moved the frontier from puzzle-solving to science-grade reasoning-the kind needed to tackle problems whose answers must stand against nature, not merely fit a rubric. Physics is the sharpest test of this shift, which binds symbols to reality in a fundamental way, serving as the cornerstone of most modern technologies. In this work, we manage to advance physics research by developing large language models with exceptional physics reasoning capabilities, especially excel at solving Olympiad-level physics problems. We introduce P1, a family of open-source physics reasoning models trained entirely through reinforcement learning (RL). Among them, P1-235B-A22B is the first open-source model with Gold-medal performance at the latest International Physics Olympiad (IPhO 2025), and wins 12 gold medals out of 13 international/regional physics competitions in 2024/2025. P1-30B-A3B also surpasses almost all other open-source models on IPhO 2025, getting a silver medal. Further equipped with an agentic framework PhysicsMinions, P1-235B-A22B+PhysicsMinions achieves overall No.1 on IPhO 2025, and obtains the highest average score over the 13 physics competitions. Besides physics, P1 models also present great performance on other reasoning tasks like math and coding, showing the great generalibility of P1 series.
Problem

Research questions and friction points this paper is trying to address.

Developing large language models for physics reasoning
Solving Olympiad-level physics problems using reinforcement learning
Advancing science-grade reasoning beyond puzzle-solving capabilities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training physics reasoning models with reinforcement learning
Achieving gold-medal performance in Physics Olympiads
Integrating agentic frameworks for enhanced problem-solving capabilities
🔎 Similar Papers
No similar papers found.
J
Jiacheng Chen
P1 Team, Shanghai AI Laboratory
Qianjia Cheng
Qianjia Cheng
Shanghai AI Lab
Fangchen Yu
Fangchen Yu
Ph.D Candidate, The Chinese University of Hong Kong, Shenzhen
Satistical Machine LearningOptimizationAI for ScienceMLLM
H
Haiyuan Wan
P1 Team, Shanghai AI Laboratory
Y
Yuchen Zhang
P1 Team, Shanghai AI Laboratory
Shenghe Zheng
Shenghe Zheng
Harbin Institute of Technology
Large Language ModelEfficient AINeural Architecture Search
Junchi Yao
Junchi Yao
University of Electronic Science and Technology of China, Shanghai AI Lab
XAILLM AgentsLLM4Science
Qingyang Zhang
Qingyang Zhang
PhD student, Tianjin University
Large Reasoning ModelsOut-of-DistributionMultimodal Fusion
H
Haonan He
P1 Team, Shanghai AI Laboratory
Yun Luo
Yun Luo
Shanghai AI Lab
natural language processinggraph neural network
Y
Yufeng Zhao
P1 Team, Shanghai AI Laboratory
F
Futing Wang
P1 Team, Shanghai AI Laboratory
L
Li Sheng
P1 Team, Shanghai AI Laboratory
C
Chengxing Xie
P1 Team, Shanghai AI Laboratory
Y
Yuxin Zuo
P1 Team, Shanghai AI Laboratory
Yizhuo Li
Yizhuo Li
The University of Hong Kong
Wenxuan Zeng
Wenxuan Zeng
Peking University
Efficient Deep LearningLarge Language Model
Y
Yulun Wu
P1 Team, Shanghai AI Laboratory
R
Rui Huang
P1 Team, Shanghai AI Laboratory
Dongzhan Zhou
Dongzhan Zhou
Researcher at Shanghai AI Lab
AI4Sciencecomputer visiondeep learning
K
Kai Chen
P1 Team, Shanghai AI Laboratory
Y
Yu Qiao
P1 Team, Shanghai AI Laboratory
Lei Bai
Lei Bai
Shanghai AI Laboratory
Foundation ModelScience IntelligenceMulti-Agent SystemAutonomous Discovery
Y
Yu Cheng
P1 Team, Shanghai AI Laboratory
N
Ning Ding
P1 Team, Shanghai AI Laboratory