Arrows of Math Reasoning Data Synthesis for Large Language Models: Diversity, Complexity and Correctness

📅 2025-08-26

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

High-quality, scalable training data for mathematical reasoning in large language models (LLMs) remains scarce, heterogeneous in quality, and costly to construct. Method: This paper proposes a program-driven mathematical data synthesis framework that integrates formal mathematical knowledge systems with domain-specific tools to automatically generate executable programs, which are then converted into natural-language question-answer pairs. A novel bilateral verification mechanism jointly validates both program logic and natural-language solutions for consistency and correctness. Contribution/Results: The framework produces 12.3 million high-quality, diverse, and complex mathematical reasoning triples, significantly improving model generalization. Fine-tuned models achieve state-of-the-art performance on GSM8K and MATH benchmarks. This work establishes a systematic, scalable, and trustworthy paradigm for synthesizing mathematically rigorous reasoning data.

Technology Category

Application Category

📝 Abstract

Enhancing the mathematical reasoning of large language models (LLMs) demands high-quality training data, yet conventional methods face critical challenges in scalability, cost, and data reliability. To address these limitations, we propose a novel program-assisted synthesis framework that systematically generates a high-quality mathematical corpus with guaranteed diversity, complexity, and correctness. This framework integrates mathematical knowledge systems and domain-specific tools to create executable programs. These programs are then translated into natural language problem-solution pairs and vetted by a bilateral validation mechanism that verifies solution correctness against program outputs and ensures program-problem consistency. We have generated 12.3 million such problem-solving triples. Experiments demonstrate that models fine-tuned on our data significantly improve their inference capabilities, achieving state-of-the-art performance on several benchmark datasets and showcasing the effectiveness of our synthesis approach.

Problem

Research questions and friction points this paper is trying to address.

Generating high-quality math training data for LLMs

Ensuring data diversity, complexity, and correctness

Overcoming scalability and reliability challenges in data synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Program-assisted synthesis framework for data generation

Bilateral validation ensures solution and program correctness

Generates scalable math problem-solution pairs via executable programs

🔎 Similar Papers

No similar papers found.