AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset

📅 2025-04-23

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the scarcity of large-scale, high-quality long-reasoning data and the limitations of existing optimization paradigms for mathematical olympiad–level reasoning. To this end, we propose three key innovations: (1) a novel tool-augmented long-reasoning data generation paradigm, yielding 540K problems and 3.2M code-enhanced step-by-step solutions; (2) a generative solution selection (GenSelect) mechanism that replaces conventional majority voting to significantly improve solution quality; and (3) an integrated training framework combining iterative code-execution supervision, multi-candidate generation-and-ranking modeling, and joint supervised fine-tuning with RLHF. Our approach achieves state-of-the-art performance across multiple mathematical reasoning benchmarks. Furthermore, we publicly release OpenMathReasoning—the first large-scale, open-source mathematical reasoning dataset—comprising all problems, annotated solutions, executable code, and trained models, under a commercial-use license.

Technology Category

Application Category

📝 Abstract

This paper presents our winning submission to the AI Mathematical Olympiad - Progress Prize 2 (AIMO-2) competition. Our recipe for building state-of-the-art mathematical reasoning models relies on three key pillars. First, we create a large-scale dataset comprising 540K unique high-quality math problems, including olympiad-level problems, and their 3.2M long-reasoning solutions. Second, we develop a novel method to integrate code execution with long reasoning models through iterative training, generation, and quality filtering, resulting in 1.7M high-quality Tool-Integrated Reasoning solutions. Third, we create a pipeline to train models to select the most promising solution from many candidates. We show that such generative solution selection (GenSelect) can significantly improve upon majority voting baseline. Combining these ideas, we train a series of models that achieve state-of-the-art results on mathematical reasoning benchmarks. To facilitate further research, we release our code, models, and the complete OpenMathReasoning dataset under a commercially permissive license.

Problem

Research questions and friction points this paper is trying to address.

Building large-scale high-quality math dataset with solutions

Integrating code execution with reasoning models iteratively

Training models to select best solutions from candidates

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale dataset with 540K math problems

Code execution integrated with reasoning models

Generative solution selection pipeline

🔎 Similar Papers

No similar papers found.