🤖 AI Summary
This work addresses the challenges of deploying large language models to execute multi-step tasks under strict monetary budget constraints, where the vast state-action space, high exploration costs, and high outcome variance hinder effective decision-making. To tackle these issues, the authors propose the INTENT framework, which introduces an intention-driven hierarchical planning mechanism. During inference, INTENT employs an intention-aware hierarchical world model to anticipate future tool invocation trajectories and integrates risk-calibrated cost estimation to guide online decision-making. Evaluated on the StableToolBench benchmark, the approach strictly adheres to budget limits while significantly improving task success rates and demonstrates robustness under varying tool pricing and budget adjustments.
📝 Abstract
We study budget-constrained tool-augmented agents, where a large language model must solve multi-step tasks by invoking external tools under a strict monetary budget. We formalize this setting as sequential decision making in context space with priced and stochastic tool executions, making direct planning intractable due to massive state-action spaces, high variance of outcomes and prohibitive exploration cost. To address these challenges, we propose INTENT, an inference-time planning framework that leverages an intention-aware hierarchical world model to anticipate future tool usage, risk-calibrated cost, and guide decisions online. Across cost-augmented StableToolBench, INTENT strictly enforces hard budget feasibility while substantially improving task success over baselines, and remains robust under dynamic market shifts such as tool price changes and varying budgets.