Teaching LLMs to Plan: Logical Chain-of-Thought Instruction Tuning for Symbolic Planning

📅 2025-09-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit weak reasoning capabilities in symbolic planning tasks—such as PDDL-based planning—failing to reliably ensure action applicability, correct state transitions, and invariant preservation. To address this, we propose PDDL-Instruct, a framework that explicitly guides LLMs through a logic-aware chain-of-thought (CoT) to verify preconditions, apply effects, and enforce invariants stepwise. Integrated with instruction fine-tuning and structured prompting, the framework enables systematic, self-correcting planning. Evaluated on multiple standard planning benchmarks, PDDL-Instruct achieves 94% plan correctness—outperforming the strongest baseline by an absolute margin of 66%. This work marks the first integration of formal logical reasoning mechanisms directly into the LLM planning pipeline, significantly enhancing both the model’s capability for formal symbolic planning and the interpretability of its reasoning process.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have demonstrated impressive capabilities across diverse tasks, yet their ability to perform structured symbolic planning remains limited, particularly in domains requiring formal representations like the Planning Domain Definition Language (PDDL). In this paper, we present a novel instruction tuning framework, PDDL-Instruct, designed to enhance LLMs' symbolic planning capabilities through logical chain-of-thought reasoning. Our approach focuses on teaching models to rigorously reason about action applicability, state transitions, and plan validity using explicit logical inference steps. By developing instruction prompts that guide models through the precise logical reasoning required to determine when actions can be applied in a given state, we enable LLMs to self-correct their planning processes through structured reflection. The framework systematically builds verification skills by decomposing the planning process into explicit reasoning chains about precondition satisfaction, effect application, and invariant preservation. Experimental results on multiple planning domains show that our chain-of-thought reasoning based instruction-tuned models are significantly better at planning, achieving planning accuracy of up to 94% on standard benchmarks, representing a 66% absolute improvement over baseline models. This work bridges the gap between the general reasoning capabilities of LLMs and the logical precision required for automated planning, offering a promising direction for developing better AI planning systems.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' symbolic planning capabilities with logical reasoning
Teaching models rigorous action applicability and state transition reasoning
Bridging the gap between general reasoning and logical planning precision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Logical chain-of-thought instruction tuning
PDDL-Instruct framework for symbolic planning
Self-correcting planning through structured reflection