AdaReasoner: Dynamic Tool Orchestration for Iterative Visual Reasoning

📅 2026-01-26

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the limited capability of multimodal large language models in complex visual reasoning tasks by proposing a method that enables autonomous, dynamic composition of multiple tools for multi-step reasoning. The approach introduces an extensible long-horizon tool-interaction data generation pipeline, a task-success-driven reinforcement learning algorithm named Tool-GRPO, and an adaptive tool-calling mechanism. For the first time, tool usage is modeled as a general reasoning capability rather than a task-specific behavior. The framework supports zero-shot generalization to both novel tools and unseen tasks, achieving state-of-the-art performance across multiple challenging benchmarks: the 7B model yields an average performance gain of 24.9% and surpasses strong baselines—including GPT-5—on tasks such as VSP and Jigsaw.

Technology Category

Application Category

📝 Abstract

When humans face problems beyond their immediate capabilities, they rely on tools, providing a promising paradigm for improving visual reasoning in multimodal large language models (MLLMs). Effective reasoning, therefore, hinges on knowing which tools to use, when to invoke them, and how to compose them over multiple steps, even when faced with new tools or new tasks. We introduce \textbf{AdaReasoner}, a family of multimodal models that learn tool use as a general reasoning skill rather than as tool-specific or explicitly supervised behavior. AdaReasoner is enabled by (i) a scalable data curation pipeline exposing models to long-horizon, multi-step tool interactions; (ii) Tool-GRPO, a reinforcement learning algorithm that optimizes tool selection and sequencing based on end-task success; and (iii) an adaptive learning mechanism that dynamically regulates tool usage. Together, these components allow models to infer tool utility from task context and intermediate outcomes, enabling coordination of multiple tools and generalization to unseen tools. Empirically, AdaReasoner exhibits strong tool-adaptive and generalization behaviors: it autonomously adopts beneficial tools, suppresses irrelevant ones, and adjusts tool usage frequency based on task demands, despite never being explicitly trained to do so. These capabilities translate into state-of-the-art performance across challenging benchmarks, improving the 7B base model by +24.9\% on average and surpassing strong proprietary systems such as GPT-5 on multiple tasks, including VSP and Jigsaw.

Problem

Research questions and friction points this paper is trying to address.

visual reasoning

tool orchestration

multimodal large language models

tool generalization

iterative reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

tool orchestration

multimodal reasoning

reinforcement learning