Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This work addresses the challenge that improper tool usage in reasoning systems can introduce redundant computational overhead or even mislead inference. To mitigate this, the authors propose AutoTool, a novel model that implements an adaptive dual-mode reasoning strategy within a reinforcement learning framework. AutoTool dynamically decides whether to invoke external tools based on the characteristics of each query, balancing tool-assisted and pure textual reasoning through mode-specific reward functions and a joint exploration mechanism. Notably, this approach introduces the first adaptive tool-calling mechanism that prevents premature convergence to a single reasoning mode. Experimental results demonstrate that AutoTool achieves a 21.8% accuracy improvement over baseline methods on the V* benchmark and enhances reasoning efficiency by 44.9% compared to existing tool-augmented approaches on the POPE benchmark.

📝 Abstract

Tool-augmented reasoning has emerged as a promising direction for enhancing the reasoning capabilities of multimodal large language models (MLLMs). However, existing studies mainly focus on enabling models to perform tool invocation, while neglecting the necessity of invoking tools. We argue that tool usage is not always beneficial, as redundant or inappropriate invocations largely increase reasoning overhead and even mislead model predictions. To address this issue, we introduce AutoTool, a model that adaptively decides whether to invoke tools according to the characteristics of each query. Within a reinforcement learning framework, we design an explicit dual-mode reasoning strategy with mode-specific reward functions to guide the model toward producing accurate responses. Moreover, to prevent premature bias toward a single reasoning mode, AutoTool jointly explores and balances tool-assisted and text-centric reasoning throughout training, and promotes free exploration in later stages. Extensive experiments demonstrate that AutoTool exhibits outstanding performance and high efficiency, yielding a 21.8\% accuracy gain on V* benchmark compared to the base model, and a 44.9\% improvement in efficiency over existing tool-augmented methods on POPE benchmark. Code is available at https://github.com/MQinghe/AutoTool.

Problem

Research questions and friction points this paper is trying to address.

tool invocation

multimodal LLMs

reasoning overhead

adaptive tool usage

redundant tools

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive tool invocation

dual-mode reasoning

multimodal LLM