Big-Math: A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models

📅 2025-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing mathematical datasets face a trade-off between quality and scale: high-quality human-authored problems are scarce, while machine-generated problems suffer from uncontrolled quality, hindering reinforcement learning (RL) training for language models. To address this, we propose MathRL—a large-scale, high-quality mathematics dataset specifically designed for RL. It comprises 253,000 problems, all manually verified end-to-end, spanning diverse domains and a broad difficulty spectrum. We introduce a novel three-tier quality filtering framework, integrating a rule-based reformulation algorithm—yielding 47,000 new open-ended problems—and a multi-source fusion cleaning pipeline. MathRL exceeds GSM8K and MATH in scale by over an order of magnitude, while ensuring answer verifiability, problem openness, and closed-form solvability. Experiments demonstrate substantial improvements in RL training stability and cross-task generalization performance.

Technology Category

Application Category

📝 Abstract
Increasing interest in reasoning models has led math to become a prominent testing ground for algorithmic and methodological improvements. However, existing open math datasets either contain a small collection of high-quality, human-written problems or a large corpus of machine-generated problems of uncertain quality, forcing researchers to choose between quality and quantity. In this work, we present Big-Math, a dataset of over 250,000 high-quality math questions with verifiable answers, purposefully made for reinforcement learning (RL). To create Big-Math, we rigorously filter, clean, and curate openly available datasets, extracting questions that satisfy our three desiderata: (1) problems with uniquely verifiable solutions, (2) problems that are open-ended, (3) and problems with a closed-form solution. To ensure the quality of Big-Math, we manually verify each step in our filtering process. Based on the findings from our filtering process, we introduce 47,000 new questions with verified answers, Big-Math-Reformulated: closed-ended questions (i.e. multiple choice questions) that have been reformulated as open-ended questions through a systematic reformulation algorithm. Compared to the most commonly used existing open-source datasets for math reasoning, GSM8k and MATH, Big-Math is an order of magnitude larger, while our rigorous filtering ensures that we maintain the questions most suitable for RL. We also provide a rigorous analysis of the dataset, finding that Big-Math contains a high degree of diversity across problem domains, and incorporates a wide range of problem difficulties, enabling a wide range of downstream uses for models of varying capabilities and training requirements. By bridging the gap between data quality and quantity, Big-Math establish a robust foundation for advancing reasoning in LLMs.
Problem

Research questions and friction points this paper is trying to address.

Develops a large-scale math dataset for reinforcement learning
Ensures high-quality, verifiable math problems
Bridges gap between data quality and quantity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Big-Math: Large-scale, high-quality math dataset.
Reinforcement learning with verifiable math problems.
Manual verification ensures dataset integrity.
🔎 Similar Papers
No similar papers found.