Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis for Large Reasoning Models

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing data synthesis methods for large reasoning models suffer from two key limitations: (1) non-selective question generation, leading to poor alignment with solver capabilities; and (2) superficial reasoning, yielding shallow problem variants. To address these, we propose a reasoning-driven, solver-adaptive question generation framework. Our method constructs semantically related question pairs, incorporates chain-of-thought (CoT)-guided intermediate reasoning steps, and dynamically adjusts question difficulty using solver feedback as a reinforcement signal—enabling co-evolution of generator and solver. The approach integrates large-model CoT reasoning, question-pair construction, reinforcement-learning–inspired feedback, and data self-bootstrapping. Evaluated on ten mathematical and general reasoning benchmarks, it achieves an average +2.5% improvement across both language and multimodal models. Further, using generated data to retroactively fine-tune base models yields an additional +0.7% gain.

Technology Category

Application Category

📝 Abstract
Data synthesis for training large reasoning models offers a scalable alternative to limited, human-curated datasets, enabling the creation of high-quality data. However, existing approaches face several challenges: (i) indiscriminate generation that ignores the solver's ability and yields low-value problems, or reliance on complex data pipelines to balance problem difficulty; and (ii) a lack of reasoning in problem generation, leading to shallow problem variants. In this paper, we develop a problem generator that reasons explicitly to plan problem directions before synthesis and adapts difficulty to the solver's ability. Specifically, we construct related problem pairs and augment them with intermediate problem-design CoT produced by a reasoning model. These data bootstrap problem-design strategies from the generator. Then, we treat the solver's feedback on synthetic problems as a reward signal, enabling the generator to calibrate difficulty and produce complementary problems near the edge of the solver's competence. Extensive experiments on 10 mathematical and general reasoning benchmarks show that our method achieves an average improvement of 2.5% and generalizes to both language and vision-language models. Moreover, a solver trained on the synthesized data provides improved rewards for continued generator training, enabling co-evolution and yielding a further 0.7% performance gain. Our code will be made publicly available here.
Problem

Research questions and friction points this paper is trying to address.

Generating indiscriminate problems that ignore solver ability and yield low-value data
Lacking reasoning in problem generation leading to shallow problem variants
Difficulty in balancing problem difficulty without complex data pipelines
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generator plans problem directions before synthesis
Generator adapts difficulty to solver's ability
Uses solver feedback as reward for calibration
🔎 Similar Papers
No similar papers found.
Yongxian Wei
Yongxian Wei
Tsinghua University
Machine Learning
Y
Yilin Zhao
Tencent QQ
L
Li Shen
Sun Yat-sen University
Xinrui Chen
Xinrui Chen
Tsinghua University
Efficient Deep LearningComputer Vision
Runxi Cheng
Runxi Cheng
Tsinghua University
S
Sinan Du
Tsinghua University
H
Hao Yu
Tsinghua University
G
Gang Liu
Tencent QQ
J
Jiahong Yan
Tencent QQ
C
Chun Yuan
Tsinghua University
Dian Li
Dian Li
Tencent.com
MLLMvideo understandingself-supervised learningvision-language