AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-agent reinforcement learning methods rely on static tool sets, limiting flexibility and generalization in long-horizon reasoning under dynamically evolving tool environments. To address this, we propose a dynamic multi-step tool selection mechanism that enables real-time identification, ranking, and integration of previously unseen tools during inference. We construct the first large-scale, explicitly reasoned tool-selection dataset—comprising 200K samples, 1,000+ tools, and 100+ tasks. Further, we design a supervised fine-tuning plus RL trajectory stabilization framework, incorporating KL-regularized Plackett–Luce ranking models, and achieve dual-modality co-adaptation for Qwen3-8B and Qwen2.5-VL-7B. Our approach yields average improvements of +6.4% on mathematical and scientific reasoning, +4.5% on search-based QA, +7.7% on code generation, and +6.9% on multimodal understanding across ten benchmarks—while using fewer parameters and demonstrating superior zero-shot tool generalization.

Technology Category

Application Category

📝 Abstract
Agentic reinforcement learning has advanced large language models (LLMs) to reason through long chain-of-thought trajectories while interleaving external tool use. Existing approaches assume a fixed inventory of tools, limiting LLM agents' adaptability to new or evolving toolsets. We present AutoTool, a framework that equips LLM agents with dynamic tool-selection capabilities throughout their reasoning trajectories. We first construct a 200k dataset with explicit tool-selection rationales across 1,000+ tools and 100+ tasks spanning mathematics, science, code generation, and multimodal reasoning. Building on this data foundation, AutoTool employs a dual-phase optimization pipeline: (i) supervised and RL-based trajectory stabilization for coherent reasoning, and (ii) KL-regularized Plackett-Luce ranking to refine consistent multi-step tool selection. Across ten diverse benchmarks, we train two base models, Qwen3-8B and Qwen2.5-VL-7B, with AutoTool. With fewer parameters, AutoTool consistently outperforms advanced LLM agents and tool-integration methods, yielding average gains of 6.4% in math & science reasoning, 4.5% in search-based QA, 7.7% in code generation, and 6.9% in multimodal understanding. In addition, AutoTool exhibits stronger generalization by dynamically leveraging unseen tools from evolving toolsets during inference.
Problem

Research questions and friction points this paper is trying to address.

Enables dynamic tool selection for LLM agents during reasoning
Overcomes fixed tool inventory limitations in existing approaches
Enhances adaptability to new and evolving toolsets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic tool selection for agentic reasoning
Dual-phase optimization for coherent reasoning
KL-regularized ranking for consistent tool selection
🔎 Similar Papers
No similar papers found.