CoEvolve: Training LLM Agents via Agent-Data Mutual Evolution

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the limitations of conventional reinforcement learning approaches that train large language model agents using static data distributions, which fail to adapt to the agents’ evolving behaviors and consequently suffer from insufficient environmental interaction coverage. To overcome this, the authors propose CoEvolve, a novel framework that establishes a closed-loop co-evolution paradigm between agents and data. By dynamically synthesizing high-value tasks based on forgetting and uncertainty signals observed in agent trajectories, and subsequently validating these tasks through environmental interaction, CoEvolve iteratively refines the data distribution to jointly optimize both components. Evaluated on the AppWorld and BFCL benchmarks, this approach significantly enhances the performance of multiple Qwen models, achieving absolute gains of 15.58%–19.43% and effectively circumventing the bottlenecks inherent in static-data training regimes.

Technology Category

Application Category

📝 Abstract

Reinforcement learning for LLM agents is typically conducted on a static data distribution, which fails to adapt to the agent's evolving behavior and leads to poor coverage of complex environment interactions. To address these challenges, we propose CoEvolve, an agent-data mutual evolution framework that enables LLM agents to improve through closed-loop, interaction-driven training. Specifically, CoEvolve extracts feedback signals such as forgetting and uncertainty from rollout trajectories to identify failure-prone interaction patterns, and utilizes them to guide LLM-based task synthesis. The synthesized tasks are validated through environment interaction and utilized to update the data distribution, enabling joint adaptation of the agent and its data. Extensive experiments on AppWorld and BFCL across Qwen2.5-7B, Qwen3-4B, and Qwen3-30B-A3B demonstrate consistent and significant improvements over strong base models, yielding absolute gains of 19.43%, 15.58%, and 18.14%, respectively.

Problem

Research questions and friction points this paper is trying to address.

reinforcement learning

LLM agents

static data distribution

environment interaction

data adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

agent-data mutual evolution

interaction-driven training

LLM agent