🤖 AI Summary
High-quality, scalable training data for mathematical reasoning in large language models (LLMs) remains scarce, heterogeneous in quality, and costly to construct. Method: This paper proposes a program-driven mathematical data synthesis framework that integrates formal mathematical knowledge systems with domain-specific tools to automatically generate executable programs, which are then converted into natural-language question-answer pairs. A novel bilateral verification mechanism jointly validates both program logic and natural-language solutions for consistency and correctness. Contribution/Results: The framework produces 12.3 million high-quality, diverse, and complex mathematical reasoning triples, significantly improving model generalization. Fine-tuned models achieve state-of-the-art performance on GSM8K and MATH benchmarks. This work establishes a systematic, scalable, and trustworthy paradigm for synthesizing mathematically rigorous reasoning data.
📝 Abstract
Enhancing the mathematical reasoning of large language models (LLMs) demands high-quality training data, yet conventional methods face critical challenges in scalability, cost, and data reliability. To address these limitations, we propose a novel program-assisted synthesis framework that systematically generates a high-quality mathematical corpus with guaranteed diversity, complexity, and correctness. This framework integrates mathematical knowledge systems and domain-specific tools to create executable programs. These programs are then translated into natural language problem-solution pairs and vetted by a bilateral validation mechanism that verifies solution correctness against program outputs and ensures program-problem consistency. We have generated 12.3 million such problem-solving triples. Experiments demonstrate that models fine-tuned on our data significantly improve their inference capabilities, achieving state-of-the-art performance on several benchmark datasets and showcasing the effectiveness of our synthesis approach.