π€ AI Summary
Existing large language models (LLMs) struggle to adapt to dynamically evolving tools and APIs, resulting in high failure rates and poor robustness in real-world deployment. To address this, we propose ToolEVO, the first adaptive framework that systematically formalizes the tool evolution challenge. ToolEVO introduces a Monte Carlo Tree Search (MCTS)-guided active exploration mechanism and an LLM self-reflection-driven policy update mechanism, enabling online, continual optimization of tool-use strategies. We further construct ToolQA-Dβthe first benchmark explicitly designed for evaluating model resilience under tool mutations. Experiments demonstrate that ToolEVO significantly improves tool-call accuracy and long-term stability on ToolQA-D, validating its capability for continual learning in dynamic environments. Our core contributions are: (1) a formal problem formulation of tool evolution; (2) an MCTS-guided closed-loop framework integrating autonomous exploration, reflection, and policy update; and (3) the establishment of a dedicated evaluation benchmark, ToolQA-D.
π Abstract
Tool learning enables large language models (LLMs) to interact with external tools and APIs, greatly expanding the application scope of LLMs. However, due to the dynamic nature of external environments, these tools and APIs may become outdated over time, preventing LLMs from correctly invoking tools. Existing research primarily focuses on static environments and overlooks this issue, limiting the adaptability of LLMs in real-world applications. In this paper, we propose ToolEVO, a novel framework designed to enhance the adaptive and reflective capabilities of LLMs against tool variability. By leveraging Monte Carlo Tree Search, ToolEVO facilitates active exploration and interaction of LLMs within dynamic environments, allowing for autonomous self-reflection and self-updating of tool usage based on environmental feedback. Additionally, we introduce ToolQA-D, a benchmark specifically designed to evaluate the impact of tool variability. Extensive experiments demonstrate the effectiveness and stability of our approach, highlighting the importance of adaptability to tool variability for effective tool learning. Code: url{https://github.com/Chen-GX/ToolEVO}