Long-Horizon Plan Execution in Large Tool Spaces through Entropy-Guided Branching

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the challenges of multi-step task execution in large-scale tool libraries, including planning complexity, the absence of effective evaluation frameworks, and high computational overhead. To this end, the authors introduce SLATE, the first benchmark platform for evaluating tool-augmented agents that supports diverse and valid execution trajectories. They further propose an Entropy-Guided Branching (EGB) algorithm that dynamically expands decision branches based on predictive uncertainty, adaptively balancing exploration and exploitation. Experimental results in a synthetic e-commerce API environment demonstrate that the proposed approach significantly improves both task success rates and computational efficiency, validating its scalability and robustness in tool-intensive scenarios.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have significantly advanced tool-augmented agents, enabling autonomous reasoning via API interactions. However, executing multi-step tasks within massive tool libraries remains challenging due to two critical bottlenecks: (1) the absence of rigorous, plan-level evaluation frameworks and (2) the computational demand of exploring vast decision spaces stemming from large toolsets and long-horizon planning. To bridge these gaps, we first introduce SLATE (Synthetic Large-scale API Toolkit for E-commerce), a large-scale context-aware benchmark designed for the automated assessment of tool-integrated agents. Unlike static metrics, SLATE accommodates diverse yet functionally valid execution trajectories, revealing that current agents struggle with self-correction and search efficiency. Motivated by these findings, we next propose Entropy-Guided Branching (EGB), an uncertainty-aware search algorithm that dynamically expands decision branches where predictive entropy is high. EGB optimizes the exploration-exploitation trade-off, significantly enhancing both task success rates and computational efficiency. Extensive experiments on SLATE demonstrate that our dual contribution provides a robust foundation for developing reliable and scalable LLM agents in tool-rich environments.

Problem

Research questions and friction points this paper is trying to address.

Long-Horizon Planning

Large Tool Spaces

Plan Execution

Decision Space Exploration

Tool-Augmented Agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy-Guided Branching

SLATE benchmark

tool-augmented agents