Understanding and Enhancing the Planning Capability of Language Models via Multi-Token Prediction

📅 2025-09-27

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

Large language models (LLMs) struggle to learn transitive relations effectively in complex planning tasks, limiting their path reasoning capabilities. To address this, we propose the Multi-Token Prediction (MTP) paradigm—a theoretically grounded, architecture-driven approach that enhances LLMs’ generalization to unseen paths. Our method introduces three key innovations: (1) Next-Token Injection, which explicitly incorporates adjacency-step information into token representations; (2) a lightweight Transformer transition layer enabling progressive multi-step relational modeling; and (3) a synergistic architecture integrating shared output heads with the transition layer. Evaluated on synthetic graph navigation and Blocksworld benchmarks, MTP achieves substantial gains in path planning accuracy and zero-shot transfer performance. Results demonstrate that structural awareness—explicitly encoding relational compositionality within the model architecture—significantly improves planning generalization while maintaining scalability.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have achieved impressive performance across diverse tasks but continue to struggle with learning transitive relations, a cornerstone for complex planning. To address this issue, we investigate the Multi-Token Prediction (MTP) paradigm and its impact to transitive relation learning. We theoretically analyze the MTP paradigm using a Transformer architecture composed of a shared output head and a transfer layer. Our analysis reveals that the transfer layer gradually learns the multi-step adjacency information, which in turn enables the backbone model to capture unobserved transitive reachability relations beyond those directly present in the training data, albeit with some inevitable noise in adjacency estimation. Building on this foundation, we propose two strategies to enhance the transfer layer and overall learning quality: Next-Token Injection (NTI) and a Transformer-based transfer layer. Our experiments on both synthetic graphs and the Blocksworld planning benchmark validate our theoretical findings and demonstrate that the improvements significantly enhance the model's path-planning capability. These findings deepen our understanding of how Transformers with MTP learn in complex planning tasks, and provide practical strategies to overcome the transitivity bottleneck, paving the way toward structurally aware and general-purpose planning models.

Problem

Research questions and friction points this paper is trying to address.

Enhancing language models' planning via multi-token prediction

Addressing transitive relation learning limitations in LLMs

Improving path-planning capability through transfer layer strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-token prediction enhances transitive relation learning

Next-token injection improves transfer layer quality

Transformer-based transfer layer captures multi-step adjacency

🔎 Similar Papers

No similar papers found.