Learning Evolving Tools for Large Language Models

📅 2024-10-09

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Existing large language models (LLMs) struggle to adapt to dynamically evolving tools and APIs, resulting in high failure rates and poor robustness in real-world deployment. To address this, we propose ToolEVO, the first adaptive framework that systematically formalizes the tool evolution challenge. ToolEVO introduces a Monte Carlo Tree Search (MCTS)-guided active exploration mechanism and an LLM self-reflection-driven policy update mechanism, enabling online, continual optimization of tool-use strategies. We further construct ToolQA-D—the first benchmark explicitly designed for evaluating model resilience under tool mutations. Experiments demonstrate that ToolEVO significantly improves tool-call accuracy and long-term stability on ToolQA-D, validating its capability for continual learning in dynamic environments. Our core contributions are: (1) a formal problem formulation of tool evolution; (2) an MCTS-guided closed-loop framework integrating autonomous exploration, reflection, and policy update; and (3) the establishment of a dedicated evaluation benchmark, ToolQA-D.

Technology Category

Application Category

📝 Abstract

Tool learning enables large language models (LLMs) to interact with external tools and APIs, greatly expanding the application scope of LLMs. However, due to the dynamic nature of external environments, these tools and APIs may become outdated over time, preventing LLMs from correctly invoking tools. Existing research primarily focuses on static environments and overlooks this issue, limiting the adaptability of LLMs in real-world applications. In this paper, we propose ToolEVO, a novel framework designed to enhance the adaptive and reflective capabilities of LLMs against tool variability. By leveraging Monte Carlo Tree Search, ToolEVO facilitates active exploration and interaction of LLMs within dynamic environments, allowing for autonomous self-reflection and self-updating of tool usage based on environmental feedback. Additionally, we introduce ToolQA-D, a benchmark specifically designed to evaluate the impact of tool variability. Extensive experiments demonstrate the effectiveness and stability of our approach, highlighting the importance of adaptability to tool variability for effective tool learning. Code: url{https://github.com/Chen-GX/ToolEVO}

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Adaptability

Real-world Applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

ToolEVO

Monte Carlo Tree Search

ToolQA-D

🔎 Similar Papers

No similar papers found.