Knowing What to Solve Before How: Preplan Empowered LLM Mathematical Reasoning

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in existing planning-based mathematical reasoning approaches, which often neglect explicit modeling of *what* to solve and focus solely on *how* to solve it, leading to misinterpretations of the problem. To remedy this, we propose the PPC framework, which introduces a dedicated “pre-planning” stage to explicitly capture the essence of the problem, thereby establishing a novel paradigm: problem → pre-planning → planning → chain-of-thought. The framework ensures conceptual integrity of pre-planning and logical consistency across stages through three-stage synthetic data generation, Spoiler-Score leakage detection, and a composite GRPO reward mechanism. Evaluated across four backbone models and five mathematical benchmarks, PPC achieves state-of-the-art performance on 39 out of 40 metrics, improving maj@16 and pass@16 by 2.23 and 3.06 points respectively, without incurring additional inference overhead.
📝 Abstract
Current plan-based reasoning methods improve large language models (LLMs) by inserting a planning stage before execution, giving rise to the question $\rightarrow$ plan $\rightarrow$ cot paradigm. While effective, a closer examination reveals an inherent paradigm-level gap: both the planning and its execution stages decide how to solve a problem, while the prior question of what to solve; recognizing the problem type, the applicable tools, and the foreseeable pitfalls; remains entirely implicit. To bridge this gap, we propose PPC (Preplan-Plan-CoT), a framework that introduces an explicit problem-understanding stage, the preplan, yielding a new question $\rightarrow$ preplan $\rightarrow$ plan $\rightarrow$ cot paradigm. Realizing this paradigm requires safeguarding the conceptual integrity of preplan at both ends. Specifically, we design a three-stage synthesis pipeline with a spoiler-score detector that filters out leakage and spoiler failures to build clean preplan supervision, and a composite GRPO reward enforces that the generated plan genuinely follows from the preplan. Experiments across four backbones and five mathematical reasoning benchmarks show that PPC achieves the best results on 39 of 40 metrics, improving maj@16 and pass@16 by +2.23 and +3.06 over the strongest baseline without introducing additional inference token overhead.
Problem

Research questions and friction points this paper is trying to address.

problem understanding
mathematical reasoning
large language models
planning paradigm
preplan
Innovation

Methods, ideas, or system contributions that make the work stand out.

preplan
mathematical reasoning
LLM planning
spoiler-score detector
GRPO reward
🔎 Similar Papers
No similar papers found.