Agentic Proposing: Enhancing Large Language Model Reasoning via Compositional Skill Synthesis

📅 2026-02-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the scarcity of high-quality, verifiable training data and the high cost of manual annotation in complex reasoning tasks by proposing the Agentic Proposing framework, which formulates problem synthesis as a goal-driven sequential decision-making process. By dynamically composing modular reasoning skills, integrating internal reflection with tool invocation, the framework generates precise and verifiable training trajectories. It further introduces Multi-Granularity Policy Optimization (MGPO) to enhance synthesis efficiency. A 30B-parameter model trained solely on 11,000 such synthetic trajectories achieves 91.6% accuracy on AIME25, matching the performance of GPT-5 and significantly outperforming existing baselines. These results demonstrate that small-scale, high-quality synthetic data can effectively replace large-scale human-annotated datasets while exhibiting strong cross-domain generalization capabilities.

Technology Category

Application Category

📝 Abstract
Advancing complex reasoning in large language models relies on high-quality, verifiable datasets, yet human annotation remains cost-prohibitive and difficult to scale. Current synthesis paradigms often face a recurring trade-off: maintaining structural validity typically restricts problem complexity, while relaxing constraints to increase difficulty frequently leads to inconsistent or unsolvable instances. To address this, we propose Agentic Proposing, a framework that models problem synthesis as a goal-driven sequential decision process where a specialized agent dynamically selects and composes modular reasoning skills. Through an iterative workflow of internal reflection and tool-use, we develop the Agentic-Proposer-4B using Multi-Granularity Policy Optimization (MGPO) to generate high-precision, verifiable training trajectories across mathematics, coding, and science. Empirical results demonstrate that downstream solvers trained on agent-synthesized data significantly outperform leading baselines and exhibit robust cross-domain generalization. Notably, a 30B solver trained on only 11,000 synthesized trajectories achieves a state-of-the-art 91.6% accuracy on AIME25, rivaling frontier-scale proprietary models such as GPT-5 and proving that a small volume of high-quality synthetic signals can effectively substitute for massive human-curated datasets.
Problem

Research questions and friction points this paper is trying to address.

complex reasoning
synthetic data
large language models
data scalability
verifiable datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agentic Proposing
Compositional Skill Synthesis
Multi-Granularity Policy Optimization
Synthetic Data Generation
Verifiable Reasoning Trajectories
🔎 Similar Papers
No similar papers found.