TAPE: Tool-Guided Adaptive Planning and Constrained Execution in Language Model Agents

📅 2026-02-23

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the challenge of irreversible failures in language model agents operating under strong constraints, which often stem from flawed planning and stochastic execution. To mitigate this, the paper proposes a novel framework integrating tool-guided adaptive planning with constrained execution, uniquely combining external solvers and dynamic replanning mechanisms. The approach constructs multi-plan graphs, leverages external solvers to generate feasible action sequences, enforces constraint-aware decoding, and triggers adaptive replanning based on environmental feedback. This methodology substantially enhances agent robustness and task success rates, achieving an average improvement of 21.0 percentage points across challenging benchmarks—including Sokoban, ALFWorld, MuSiQue, and GSM8K-Hard—and yields gains of approximately 20.0 percentage points even when applied to weaker base models.

Technology Category

Application Category

📝 Abstract

Language Model (LM) agents have demonstrated remarkable capabilities in solving tasks that require multiple interactions with the environment. However, they remain vulnerable in environments where a single error often leads to irrecoverable failure, particularly under strict feasibility constraints. We systematically analyze existing agent frameworks, identifying imperfect planning and stochastic execution as the primary causes. To address these challenges, we propose Tool-guided Adaptive Planning with constrained Execution (TAPE). TAPE enhances planning capability by aggregating multiple plans into a graph and employing an external solver to identify a feasible path. During execution, TAPE employs constrained decoding to reduce sampling noise, while adaptively re-planning whenever environmental feedback deviates from the intended state. Experiments across Sokoban, ALFWorld, MuSiQue, and GSM8K-Hard demonstrate that TAPE consistently outperforms existing frameworks, with particularly large gains on hard settings, improving success rates by 21.0 percentage points on hard settings on average, and by 20.0 percentage points for weaker base models on average. Code and data available at here.

Problem

Research questions and friction points this paper is trying to address.

Language Model Agents

Feasibility Constraints

Planning

Execution Robustness

Irrecoverable Failure

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive planning

constrained execution

tool-guided reasoning