PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Small-scale open-source large language models (LLMs) lack explicit planning capabilities, hindering their reasoning performance and generalization on complex problem-solving tasks. Method: We propose a unified post-training framework that distills synthetic planning trajectories—i.e., task decomposition paths—generated by stronger LMs. The framework jointly employs supervised learning to imitate stepwise decomposition and reinforcement learning to optimize final answer correctness, thereby inducing step-by-step planning behavior in smaller models without architectural modifications or inference-time overhead. Contribution/Results: Our approach significantly enhances complex reasoning: it outperforms strong baselines by an average of 7% on GSM8K and MATH, and achieves ~10% and ~12% gains on OlympiadBench and AIME 2024, respectively. These results demonstrate the effectiveness and cross-domain generalizability of planning-structure distillation for boosting small-model reasoning.

Technology Category

Application Category

📝 Abstract

Recently, decomposing complex problems into simple subtasks--a crucial part of human-like natural planning--to solve the given problem has significantly boosted the performance of large language models (LLMs). However, leveraging such planning structures during post-training to boost the performance of smaller open-source LLMs remains underexplored. Motivated by this, we introduce PLAN-TUNING, a unified post-training framework that (i) distills synthetic task decompositions (termed "planning trajectories") from large-scale LLMs and (ii) fine-tunes smaller models via supervised and reinforcement-learning objectives designed to mimic these planning processes to improve complex reasoning. On GSM8k and the MATH benchmarks, plan-tuned models outperform strong baselines by an average $sim7%$. Furthermore, plan-tuned models show better generalization capabilities on out-of-domain datasets, with average $sim10%$ and $sim12%$ performance improvements on OlympiadBench and AIME 2024, respectively. Our detailed analysis demonstrates how planning trajectories improves complex reasoning capabilities, showing that PLAN-TUNING is an effective strategy for improving task-specific performance of smaller LLMs.

Problem

Research questions and friction points this paper is trying to address.

Enhancing small LLMs' complex problem-solving via step-by-step planning

Distilling planning trajectories from large LLMs to smaller models

Improving reasoning performance on math and out-of-domain benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distills planning trajectories from large LLMs

Fine-tunes smaller models via supervised learning

Uses reinforcement learning to mimic planning

🔎 Similar Papers

Improving Planning with Large Language Models: A Modular Agentic Architecture

2023-09-30Citations: 6

Unlocking Large Language Model's Planning Capabilities with Maximum Diversity Fine-tuning

2024-06-15arXiv.orgCitations: 1

StepTool: Enhancing Multi-Step Tool Usage in LLMs through Step-Grained Reinforcement Learning

2024-10-10Citations: 1

Authors to Follow