๐ค AI Summary
This work addresses two key challenges faced by large language models in complex tasks with dynamic tool libraries: the difficulty of aligning abstract user goals with technical documentation and the inability of fixed-dimensional embeddings to capture the combinatorial nature of tool compositions. To overcome these limitations, the authors propose TOOLQP, a novel framework that formulates tool retrieval as a learnable multi-step query planning problem. TOOLQP decomposes tasks into sub-goals, dynamically constructs queries, and iteratively invokes a retriever to precisely identify effective tool combinations. The approach integrates synthetic query trajectory pretraining with a reinforcement learningโbased verifiable reward mechanism (RLVR), substantially enhancing zero-shot generalization and robustness across diverse retrievers. Experimental results demonstrate that TOOLQP achieves state-of-the-art performance on multiple benchmarks and significantly improves downstream agent execution success rates.
๐ Abstract
LLM agents operating over massive, dynamic tool libraries rely on effective retrieval, yet standard single-shot dense retrievers struggle with complex requests. These failures primarily stem from the disconnect between abstract user goals and technical documentation, and the limited capacity of fixed-size embeddings to model combinatorial tool compositions. To address these challenges, we propose TOOLQP, a lightweight framework that models retrieval as iterative query planning. Instead of single-shot matching, TOOLQP decomposes instructions into sub-tasks and dynamically generates queries to interact with the retriever, effectively bridging the semantic gap by targeting the specific sub-tasks required for composition. We train TOOLQP using synthetic query trajectories followed by optimization via Reinforcement Learning with Verifiable Rewards (RLVR). Experiments demonstrate that TOOLQP achieves state-of-the-art performance, exhibiting superior zero-shot generalization, robustness across diverse retrievers, and significant improvements in downstream agentic execution.