Beyond Single-Shot: Multi-step Tool Retrieval via Query Planning

📅 2026-01-12

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses two key challenges faced by large language models in complex tasks with dynamic tool libraries: the difficulty of aligning abstract user goals with technical documentation and the inability of fixed-dimensional embeddings to capture the combinatorial nature of tool compositions. To overcome these limitations, the authors propose TOOLQP, a novel framework that formulates tool retrieval as a learnable multi-step query planning problem. TOOLQP decomposes tasks into sub-goals, dynamically constructs queries, and iteratively invokes a retriever to precisely identify effective tool combinations. The approach integrates synthetic query trajectory pretraining with a reinforcement learning–based verifiable reward mechanism (RLVR), substantially enhancing zero-shot generalization and robustness across diverse retrievers. Experimental results demonstrate that TOOLQP achieves state-of-the-art performance on multiple benchmarks and significantly improves downstream agent execution success rates.

Technology Category

Application Category

📝 Abstract

LLM agents operating over massive, dynamic tool libraries rely on effective retrieval, yet standard single-shot dense retrievers struggle with complex requests. These failures primarily stem from the disconnect between abstract user goals and technical documentation, and the limited capacity of fixed-size embeddings to model combinatorial tool compositions. To address these challenges, we propose TOOLQP, a lightweight framework that models retrieval as iterative query planning. Instead of single-shot matching, TOOLQP decomposes instructions into sub-tasks and dynamically generates queries to interact with the retriever, effectively bridging the semantic gap by targeting the specific sub-tasks required for composition. We train TOOLQP using synthetic query trajectories followed by optimization via Reinforcement Learning with Verifiable Rewards (RLVR). Experiments demonstrate that TOOLQP achieves state-of-the-art performance, exhibiting superior zero-shot generalization, robustness across diverse retrievers, and significant improvements in downstream agentic execution.

Problem

Research questions and friction points this paper is trying to address.

tool retrieval

query planning

semantic gap

combinatorial composition

LLM agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-step retrieval

query planning

tool composition