Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work proposes a general-purpose reasoning system designed to achieve gold-medal performance in international olympiads such as the International Mathematical Olympiad (IMO) and the International Physics Olympiad (IPhO). By integrating a unified training framework that combines reverse perplexity curricula, two-stage reinforcement learning, and test-time scaling, the system enables stable long-horizon reasoning and cross-disciplinary generalization. The approach synergistically leverages supervised fine-tuning, verifiable reward-driven reinforcement learning, and proof-level optimization to substantially enhance the model’s capacity for rigorous proof search and self-verification. The resulting model, SU-01, demonstrates gold-medal-level performance on recent competitions including IMO 2025, USAMO 2026, and IPhO 2024/2025, while supporting stable inference over sequences exceeding 100,000 tokens.

📝 Abstract

Recent progress in reasoning models has substantially advanced long-horizon mathematical and scientific problem solving, with several systems now reaching gold-medal-level performance on International Mathematical Olympiad (IMO) and International Physics Olympiad (IPhO) problems. In this paper, we introduce a simple and unified recipe for converting a post-trained reasoning backbone into a rigorous olympiad-level solver. The recipe first uses a reverse-perplexity curriculum for SFT to instill rigorous proof-search and self-checking behaviors, then scales these behaviors through a two-stage RL pipeline that progresses from RL with verifiable rewards to more delicate proof-level RL, and finally boosts solving performance with test-time scaling. Applying this recipe, we train a 30B-A3B backbone with SFT on around 340K sub-8K-token trajectories followed by 200 RL steps. The resulting model, SU-01, supports stable reasoning on difficult problems with trajectories exceeding 100K tokens, while achieving gold-medal-level performance on mathematical and physical olympiad competitions, including IMO 2025/USAMO 2026 and IPhO 2024/2025. It also demonstrates strong generalization of scientific reasoning to domains beyond mathematics and physics.

Problem

Research questions and friction points this paper is trying to address.

Olympiad reasoning

mathematical problem solving

scientific reasoning

long-horizon reasoning

AI reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

reverse-perplexity curriculum

two-stage reinforcement learning

proof-level RL