🤖 AI Summary
This work addresses the challenges of deploying large language models (LLMs) in industrial settings, where generated plans often suffer from structural invalidity or excessive verbosity, leading to execution failures and high tool-calling costs. To overcome these limitations, the authors propose SPIN, a novel framework that introduces a strict directed acyclic graph (DAG) contract into LLM-based planning. By enforcing DAG constraints, validating plans, and applying repair-oriented prompting, SPIN ensures structural correctness. Furthermore, it employs prefix-based incremental evaluation to dynamically assess execution sufficiency and terminate redundant steps early. Evaluated on AssetOpsBench, SPIN reduces the number of executed steps from 1,061 to 623, improves task completion rate from 0.638 to 0.706, and decreases average tool calls per task from 11.81 to 6.82. On MCP Bench, it consistently enhances planning quality and dependency modeling across multiple models.
📝 Abstract
Industrial LLM agent systems often separate planning from execution, yet LLM planners frequently produce structurally invalid or unnecessarily long workflows, leading to brittle failures and avoidable tool and API cost. We propose \texttt{SPIN}, a planning wrapper that combines validated Directed Acyclic Graph (DAG) planning with prefix based execution control. \texttt{SPIN} enforces a strict DAG contract through \texttt{\_validate\_plan\_text} and repair prompting, producing executable plans before downstream execution, and then evaluates DAG prefixes incrementally to stop when the current prefix is sufficient to answer the query. On AssetOpsBench, across 261 scenarios, \texttt{SPIN} reduces executed tasks from 1061 to 623 and improves \emph{Accomplished} from 0.638 to 0.706, while reducing tool calls from 11.81 to 6.82 per run. On MCP Bench, the same wrapper improves planning, grounding, and dependency related scores for both GPT OSS1 and Llama 4 Maverick.