PIVOT: Bridging Planning and Execution in LLM Agents via Trajectory Refinement

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Large language model (LLM) agents often generate plans that fail due to inexecutable actions, constraint violations, and error accumulation over long horizons, leading to a disconnect between planning and execution. This work proposes PIVOT, a novel framework that treats complete trajectories as end-to-end optimizable units. PIVOT employs an iterative four-phase process—PLAN, INSPECT, EVOLVE, and VERIFY—to refine trajectories through environmental interaction, leveraging structured losses and textual gradients. A monotonic acceptance mechanism ensures consistent improvement in solution quality. The approach supports both fully automated and human-in-the-loop optimization, achieving state-of-the-art performance on the DeepPlanning and GAIA benchmarks: in human-in-the-loop settings, constraint satisfaction rates improve by up to 94%, while the fully automated variant significantly outperforms baselines using only one-third to one-fifth of the tokens consumed by competing methods.

📝 Abstract

Large language model (LLM)-based agents frequently generate seemingly coherent plans that fail upon execution due to infeasible actions, constraint violations, and compounding errors over extended horizons. PIVOT (Plan-Inspect-eVOlve Trajectories) addresses this plan-execution misalignment through a self-supervised framework that treats trajectories as optimizable objects iteratively refined via environment interaction. The framework comprises four stages: PLAN generates candidate trajectories; INSPECT executes them and computes structured losses with textual gradients encoding plan-execution discrepancies; EVOLVE applies these signals to produce improved trajectories; and VERIFY performs a final global check against task constraints. A monotonic acceptance process ensures a non-decreasing solution quality. Empirical evaluations on DeepPlanning and GAIA demonstrate state-of-the-art performance: with human-in-the-loop (HITL) feedback, PIVOT establishes a strong upper bound up to 94% relative improvement in constraint satisfaction, while its fully autonomous variant retains substantial gains, showing that the core trajectory-refinement mechanism remains effective without external supervision. At the same time, PIVOT remains computationally efficient, requiring up to 3x to 5x fewer tokens than competing refinement methods. These findings establish that (self- or human-supervised) feedback-based trajectory optimization is a principled methodology for mitigating plan-execution gaps in autonomous agent systems.

Problem

Research questions and friction points this paper is trying to address.

plan-execution misalignment

trajectory refinement

constraint violation

compounding errors

LLM agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

trajectory refinement

plan-execution alignment

self-supervised learning