OctoTools: An Agentic Framework with Extensible Tools for Complex Reasoning

📅 2025-02-16

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Existing tool-augmented LLM approaches are often confined to single domains, restricted to specific tool types, or require additional fine-tuning. This paper introduces the first training-free, user-friendly, and highly scalable open-source agent framework capable of cross-modal, cross-domain complex reasoning. Our method standardizes heterogeneous tool integration via a unified “tool card” abstraction—encapsulating vision understanding, knowledge retrieval, numerical computation, and other APIs under a common interface. It employs a dual-granularity hierarchical planning mechanism to decompose high-level tasks and coordinate low-level tool invocation. A dynamic execution engine enables zero-shot adaptation to arbitrary external tools without retraining. Evaluated on 16 diverse benchmarks—including MathVista, MMLU-Pro, MedQA, and GAIA-Text—our framework achieves an average accuracy 9.3% higher than GPT-4o and up to 10.6% higher than leading frameworks such as AutoGen.

Technology Category

Application Category

📝 Abstract

Solving complex reasoning tasks may involve visual understanding, domain knowledge retrieval, numerical calculation, and multi-step reasoning. Existing methods augment large language models (LLMs) with external tools but are restricted to specialized domains, limited tool types, or require additional training data. In this paper, we introduce OctoTools, a training-free, user-friendly, and easily extensible open-source agentic framework designed to tackle complex reasoning across diverse domains. OctoTools introduces standardized tool cards to encapsulate tool functionality, a planner for both high-level and low-level planning, and an executor to carry out tool usage. We validate OctoTools' generality across 16 diverse tasks (including MathVista, MMLU-Pro, MedQA, and GAIA-Text), achieving substantial average accuracy gains of 9.3% over GPT-4o. Furthermore, OctoTools outperforms AutoGen, GPT-Functions and LangChain by up to 10.6% when given the same set of tools. Through comprehensive analysis and ablations, OctoTools demonstrates advantages in task planning, effective tool usage, and multi-step problem solving.

Problem

Research questions and friction points this paper is trying to address.

Enhances complex reasoning with extensible tools

Standardizes tool functionality across diverse domains

Improves accuracy in multi-step problem solving

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free agentic framework

Standardized tool cards

Planner and executor system

🔎 Similar Papers

ToolGen: Unified Tool Retrieval and Calling via Generation