Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

πŸ“… 2026-05-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes a general-purpose reasoning system designed to achieve gold-medal performance in international olympiads such as the International Mathematical Olympiad (IMO) and the International Physics Olympiad (IPhO). By integrating a unified training framework that combines reverse perplexity curricula, two-stage reinforcement learning, and test-time scaling, the system enables stable long-horizon reasoning and cross-disciplinary generalization. The approach synergistically leverages supervised fine-tuning, verifiable reward-driven reinforcement learning, and proof-level optimization to substantially enhance the model’s capacity for rigorous proof search and self-verification. The resulting model, SU-01, demonstrates gold-medal-level performance on recent competitions including IMO 2025, USAMO 2026, and IPhO 2024/2025, while supporting stable inference over sequences exceeding 100,000 tokens.
πŸ“ Abstract
Recent progress in reasoning models has substantially advanced long-horizon mathematical and scientific problem solving, with several systems now reaching gold-medal-level performance on International Mathematical Olympiad (IMO) and International Physics Olympiad (IPhO) problems. In this paper, we introduce a simple and unified recipe for converting a post-trained reasoning backbone into a rigorous olympiad-level solver. The recipe first uses a reverse-perplexity curriculum for SFT to instill rigorous proof-search and self-checking behaviors, then scales these behaviors through a two-stage RL pipeline that progresses from RL with verifiable rewards to more delicate proof-level RL, and finally boosts solving performance with test-time scaling. Applying this recipe, we train a 30B-A3B backbone with SFT on around 340K sub-8K-token trajectories followed by 200 RL steps. The resulting model, SU-01, supports stable reasoning on difficult problems with trajectories exceeding 100K tokens, while achieving gold-medal-level performance on mathematical and physical olympiad competitions, including IMO 2025/USAMO 2026 and IPhO 2024/2025. It also demonstrates strong generalization of scientific reasoning to domains beyond mathematics and physics.
Problem

Research questions and friction points this paper is trying to address.

Olympiad reasoning
mathematical problem solving
scientific reasoning
long-horizon reasoning
AI reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

reverse-perplexity curriculum
two-stage reinforcement learning
proof-level RL
test-time scaling
olympiad-level reasoning
πŸ”Ž Similar Papers
No similar papers found.
Yafu Li
Yafu Li
The Chinese University of Hong Kong
ReasoningTrustworthy AIMultilinguality
Runzhe Zhan
Runzhe Zhan
Ph.D. Candidate, University of Macau
Machine TranslationLanguage ModelsMultilinguality
Haoran Zhang
Haoran Zhang
Shanghai Jiao Tong University
Language ModelSecurityReasoning
S
Shunkai Zhang
Shanghai AI Laboratory
Yizhuo Li
Yizhuo Li
The University of Hong Kong
Z
Zhilin Wang
Shanghai AI Laboratory
Jiacheng Chen
Jiacheng Chen
The Chinese University of Hong Kong
Natural Language ProcessingReinforcement LearningOptimization
F
Futing Wang
Shanghai AI Laboratory
X
Xuyang Hu
Shanghai AI Laboratory
Yuchen Fan
Yuchen Fan
Shanghai AI Laboratory & Shanghai Jiao Tong University
NLPLarge Language ModelsEvaluation
B
Bangjie Xu
Tsinghua University
Y
Yucheng Su
Tsinghua University
X
Xinmiao Han
Tsinghua University
C
Chenxi Li
Shanghai AI Laboratory
H
Haodi Lei
Shanghai AI Laboratory
Y
Yufeng Zhao
Shanghai AI Laboratory
Z
Zejin Lin
Tsinghua University
Qianjia Cheng
Qianjia Cheng
Shanghai AI Lab
T
Tong Zhu
Shanghai AI Laboratory
Xiaoye Qu
Xiaoye Qu
Shanghai AI Lab
Ganqu Cui
Ganqu Cui
Shanghai AI Lab
LLM AlignmentReinforcement Learning
P
Peng Ye
Shanghai AI Laboratory
Yun Luo
Yun Luo
Shanghai AI Lab
natural language processinggraph neural network
Zhouchen Lin
Zhouchen Lin
Professor, Peking University; Fellow of IEEE, IAPR, CSIG & AAIA; ex-VP of Samsung Research
machine learningcomputer visionimage processingnumerical optimization
Yu Qiao
Yu Qiao
Professor of Shanghai AI Laboratory; Shenzhen Institutes of Advanced Technology, CAS
Computer VisionPattern RecognitionLarge Multimodal ModelLarge Language Model