Tool-R0: Self-Evolving LLM Agents for Tool-Learning from Zero Data

📅 2026-02-24

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work proposes Tool-R0, a novel framework that enables self-evolution of large language model (LLM)-based tool-using agents without relying on pre-collected task-solution pairs or human supervision. Addressing the limitation of traditional reinforcement learning—which depends heavily on manually annotated data and struggles to support autonomous agent evolution in open-ended environments—Tool-R0 employs self-play reinforcement learning to co-evolve a task generator and a solver. The generator dynamically constructs challenging tasks near the current capability frontier of the solver, while the solver learns to invoke real-world tools to complete these tasks, establishing a continuous self-improvement loop driven by evolving task difficulty. Evaluated across multiple tool-use benchmarks, the method achieves a 92.5% performance gain over the base model and surpasses fully supervised baselines under comparable settings.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are becoming the foundation for autonomous agents that can use tools to solve complex tasks. Reinforcement learning (RL) has emerged as a common approach for injecting such agentic capabilities, but typically under tightly controlled training setups. It often depends on carefully constructed task-solution pairs and substantial human supervision, which creates a fundamental obstacle to open-ended self-evolution toward superintelligent systems. In this paper, we propose Tool-R0 framework for training general purpose tool-calling agents from scratch with self-play RL, under a zero-data assumption. Initialized from the same base LLM, Tool-R0 co-evolves a Generator and a Solver with complementary rewards: one proposes targeted challenging tasks at the other's competence frontier and the other learns to solve them with real-world tool calls. This creates a self-evolving cycle that requires no pre-existing tasks or datasets. Evaluation on different tool-use benchmarks show that Tool-R0 yields 92.5 relative improvement over the base model and surpasses fully supervised tool-calling baselines under the same setting. Our work further provides empirical insights into self-play LLM agents by analyzing co-evolution, curriculum dynamics, and scaling behavior.

Problem

Research questions and friction points this paper is trying to address.

tool-learning

zero-data

self-evolving

LLM agents

reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-play reinforcement learning

tool-use agents

zero-data training