GenEnv: Difficulty-Aligned Co-Evolution Between LLM Agents and Environment Simulators

📅 2025-12-22

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

To address the high cost of real-world interaction data and the limitations of static datasets in aligning with evolving LLM-agent capabilities, this paper proposes a curriculum learning framework featuring dynamic generative environments and co-evolving agents. Our method introduces: (1) an α-curriculum reward mechanism that dynamically adjusts task difficulty based on real-time agent capability assessment; (2) a bidirectional co-evolution paradigm between generative environments and agents, enabling capability-adaptive task generation and environment modeling; and (3) integration of dynamic curriculum learning with LLM reinforcement fine-tuning. Evaluated on five benchmarks, our approach achieves up to 40.3% improvement over a 7B baseline—matching the performance of significantly larger models—while requiring only 30.3% of the data volume used by Gemini 2.5 Pro’s offline augmentation method.

Technology Category

Application Category

📝 Abstract

Training capable Large Language Model (LLM) agents is critically bottlenecked by the high cost and static nature of real-world interaction data. We address this by introducing GenEnv, a framework that establishes a difficulty-aligned co-evolutionary game between an agent and a scalable, generative environment simulator. Unlike traditional methods that evolve models on static datasets, GenEnv instantiates a dataevolving: the simulator acts as a dynamic curriculum policy, continuously generating tasks specifically tailored to the agent's ``zone of proximal development''. This process is guided by a simple but effective $α$-Curriculum Reward, which aligns task difficulty with the agent's current capabilities. We evaluate GenEnv on five benchmarks, including API-Bank, ALFWorld, BFCL, Bamboogle, and TravelPlanner. Across these tasks, GenEnv improves agent performance by up to extbf{+40.3%} over 7B baselines and matches or exceeds the average performance of larger models. Compared to Gemini 2.5 Pro-based offline data augmentation, GenEnv achieves better performance while using 3.3$ imes$ less data. By shifting from static supervision to adaptive simulation, GenEnv provides a data-efficient pathway for scaling agent capabilities.

Problem

Research questions and friction points this paper is trying to address.

Develops a co-evolutionary framework to train LLM agents efficiently

Generates adaptive tasks aligned with the agent's current skill level

Reduces reliance on costly, static real-world interaction data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Difficulty-aligned co-evolution between agent and simulator

Dynamic curriculum policy generating tailored tasks

Alpha-Curriculum Reward aligning difficulty with capabilities

🔎 Similar Papers

No similar papers found.