MathSmith: Towards Extremely Hard Mathematical Reasoning by Forging Synthetic Problems with a Reinforced Policy

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-quality, high-difficulty mathematical training data remains scarce, hindering the advancement of complex reasoning capabilities in large language models (LLMs). To address this, we propose the first zero-shot framework for generating diverse, high-difficulty mathematical problems grounded in formal concept explanations. Our method samples concepts from PlanetMath, employs autoregressive chain-of-thought generation augmented with nine soft reasoning constraints, and jointly optimizes structural validity, cognitive complexity, and answer consistency via reinforcement learning. We further introduce a weakness-focused variant generation strategy to enhance data specificity. Evaluated across five difficulty-stratified benchmarks, our approach consistently outperforms existing synthetic data methods—delivering significant gains in mathematical reasoning performance under both short- and long-chain reasoning settings. The framework demonstrates strong scalability and cross-task transferability, establishing a new state of the art in LLM-driven mathematical reasoning data synthesis.

Technology Category

Application Category

📝 Abstract
Large language models have achieved substantial progress in mathematical reasoning, yet their advancement is limited by the scarcity of high-quality, high-difficulty training data. Existing synthesis methods largely rely on transforming human-written templates, limiting both diversity and scalability. We propose MathSmith, a novel framework for synthesizing challenging mathematical problems to enhance LLM reasoning. Rather than modifying existing problems, MathSmith constructs new ones from scratch by randomly sampling concept-explanation pairs from PlanetMath, ensuring data independence and avoiding contamination. To increase difficulty, we design nine predefined strategies as soft constraints during rationales. We further adopts reinforcement learning to jointly optimize structural validity, reasoning complexity, and answer consistency. The length of the reasoning trace generated under autoregressive prompting is used to reflect cognitive complexity, encouraging the creation of more demanding problems aligned with long-chain-of-thought reasoning. Experiments across five benchmarks, categorized as easy & medium (GSM8K, MATH-500) and hard (AIME2024, AIME2025, OlympiadBench), show that MathSmith consistently outperforms existing baselines under both short and long CoT settings. Additionally, a weakness-focused variant generation module enables targeted improvement on specific concepts. Overall, MathSmith exhibits strong scalability, generalization, and transferability, highlighting the promise of high-difficulty synthetic data in advancing LLM reasoning capabilities.
Problem

Research questions and friction points this paper is trying to address.

Generates extremely hard math problems from scratch
Enhances LLM reasoning with high-difficulty synthetic data
Optimizes problem complexity via reinforcement learning strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constructs new problems from scratch using PlanetMath
Uses reinforcement learning to optimize problem quality
Generates long reasoning traces to increase difficulty
🔎 Similar Papers
No similar papers found.
Shaoxiong Zhan
Shaoxiong Zhan
Tsinghua University
Natural Language ProcessingLarge Language Model
Y
Yanlin Lai
Tsinghua University
Z
Ziyu Lu
Tsinghua University
Dahua Lin
Dahua Lin
The Chinese University of Hong Kong
computer visionmachine learningprobabilistic inferencebayesian nonparametrics
Z
Ziqing Yang
SenseTime Research
F
Fei Tang
East China Normal University