AutoTool: Dynamic Tool Selection and Integration for Agentic Reasoning

📅 2025-12-15

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Existing LLM-agent reinforcement learning methods rely on static tool sets, limiting flexibility and generalization in long-horizon reasoning under dynamically evolving tool environments. To address this, we propose a dynamic multi-step tool selection mechanism that enables real-time identification, ranking, and integration of previously unseen tools during inference. We construct the first large-scale, explicitly reasoned tool-selection dataset—comprising 200K samples, 1,000+ tools, and 100+ tasks. Further, we design a supervised fine-tuning plus RL trajectory stabilization framework, incorporating KL-regularized Plackett–Luce ranking models, and achieve dual-modality co-adaptation for Qwen3-8B and Qwen2.5-VL-7B. Our approach yields average improvements of +6.4% on mathematical and scientific reasoning, +4.5% on search-based QA, +7.7% on code generation, and +6.9% on multimodal understanding across ten benchmarks—while using fewer parameters and demonstrating superior zero-shot tool generalization.

Technology Category

Application Category

📝 Abstract

Agentic reinforcement learning has advanced large language models (LLMs) to reason through long chain-of-thought trajectories while interleaving external tool use. Existing approaches assume a fixed inventory of tools, limiting LLM agents' adaptability to new or evolving toolsets. We present AutoTool, a framework that equips LLM agents with dynamic tool-selection capabilities throughout their reasoning trajectories. We first construct a 200k dataset with explicit tool-selection rationales across 1,000+ tools and 100+ tasks spanning mathematics, science, code generation, and multimodal reasoning. Building on this data foundation, AutoTool employs a dual-phase optimization pipeline: (i) supervised and RL-based trajectory stabilization for coherent reasoning, and (ii) KL-regularized Plackett-Luce ranking to refine consistent multi-step tool selection. Across ten diverse benchmarks, we train two base models, Qwen3-8B and Qwen2.5-VL-7B, with AutoTool. With fewer parameters, AutoTool consistently outperforms advanced LLM agents and tool-integration methods, yielding average gains of 6.4% in math & science reasoning, 4.5% in search-based QA, 7.7% in code generation, and 6.9% in multimodal understanding. In addition, AutoTool exhibits stronger generalization by dynamically leveraging unseen tools from evolving toolsets during inference.

Problem

Research questions and friction points this paper is trying to address.

Enables dynamic tool selection for LLM agents during reasoning

Overcomes fixed tool inventory limitations in existing approaches

Enhances adaptability to new and evolving toolsets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic tool selection for agentic reasoning

Dual-phase optimization for coherent reasoning

KL-regularized ranking for consistent tool selection

🔎 Similar Papers

ToolGen: Unified Tool Retrieval and Calling via Generation

2024-10-04arXiv.orgCitations: 5

💼 Related Jobs

Machine Learning Engineer - Agentic AI

Apple

Sunnyvale, United States of America

Authors to Follow