🤖 AI Summary
To address the high cost of real-world interaction data and the limitations of static datasets in aligning with evolving LLM-agent capabilities, this paper proposes a curriculum learning framework featuring dynamic generative environments and co-evolving agents. Our method introduces: (1) an α-curriculum reward mechanism that dynamically adjusts task difficulty based on real-time agent capability assessment; (2) a bidirectional co-evolution paradigm between generative environments and agents, enabling capability-adaptive task generation and environment modeling; and (3) integration of dynamic curriculum learning with LLM reinforcement fine-tuning. Evaluated on five benchmarks, our approach achieves up to 40.3% improvement over a 7B baseline—matching the performance of significantly larger models—while requiring only 30.3% of the data volume used by Gemini 2.5 Pro’s offline augmentation method.
📝 Abstract
Training capable Large Language Model (LLM) agents is critically bottlenecked by the high cost and static nature of real-world interaction data. We address this by introducing GenEnv, a framework that establishes a difficulty-aligned co-evolutionary game between an agent and a scalable, generative environment simulator. Unlike traditional methods that evolve models on static datasets, GenEnv instantiates a dataevolving: the simulator acts as a dynamic curriculum policy, continuously generating tasks specifically tailored to the agent's ``zone of proximal development''. This process is guided by a simple but effective $α$-Curriculum Reward, which aligns task difficulty with the agent's current capabilities. We evaluate GenEnv on five benchmarks, including API-Bank, ALFWorld, BFCL, Bamboogle, and TravelPlanner. Across these tasks, GenEnv improves agent performance by up to extbf{+40.3%} over 7B baselines and matches or exceeds the average performance of larger models. Compared to Gemini 2.5 Pro-based offline data augmentation, GenEnv achieves better performance while using 3.3$ imes$ less data. By shifting from static supervision to adaptive simulation, GenEnv provides a data-efficient pathway for scaling agent capabilities.