Scaling Agent Learning via Experience Synthesis

📅 2025-11-05

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Reinforcement learning (RL) faces scalability bottlenecks including high environment interaction costs, unreliable reward signals, poor task generalization, and complex infrastructure. To address these, we propose DreamGym—the first unified framework for scalable experience synthesis. It innovatively distills environment dynamics into a large-model-driven *reasoning-based experience model*, enabling generation of diverse, temporally consistent state transitions and feedback signals. Integrated with dynamic task generation and an offline-online hybrid experience replay mechanism, DreamGym enables stable online curriculum learning without real-world interactions. Evaluated on non-RL-native benchmarks such as WebArena, DreamGym achieves >30% performance gains over baselines using synthetic interactions alone, matches the performance of PPO and GRPO, and significantly improves sim-to-real transfer efficiency.

Technology Category

Application Category

📝 Abstract

While reinforcement learning (RL) can empower autonomous agents by enabling self-improvement through interaction, its practical adoption remains challenging due to costly rollouts, limited task diversity, unreliable reward signals, and infrastructure complexity, all of which obstruct the collection of scalable experience data. To address these challenges, we introduce DreamGym, the first unified framework designed to synthesize diverse experiences with scalability in mind to enable effective online RL training for autonomous agents. Rather than relying on expensive real-environment rollouts, DreamGym distills environment dynamics into a reasoning-based experience model that derives consistent state transitions and feedback signals through step-by-step reasoning, enabling scalable agent rollout collection for RL. To improve the stability and quality of transitions, DreamGym leverages an experience replay buffer initialized with offline real-world data and continuously enriched with fresh interactions to actively support agent training. To improve knowledge acquisition, DreamGym adaptively generates new tasks that challenge the current agent policy, enabling more effective online curriculum learning. Experiments across diverse environments and agent backbones demonstrate that DreamGym substantially improves RL training, both in fully synthetic settings and in sim-to-real transfer scenarios. On non-RL-ready tasks like WebArena, DreamGym outperforms all baselines by over 30%. And in RL-ready but costly settings, it matches GRPO and PPO performance using only synthetic interactions. When transferring a policy trained purely on synthetic experiences to real-environment RL, DreamGym yields significant additional performance gains while requiring far fewer real-world interactions, providing a scalable warm-start strategy for general-purpose RL.

Problem

Research questions and friction points this paper is trying to address.

Addresses costly real-environment rollouts in reinforcement learning training

Solves limited task diversity and unreliable reward signals for agents

Overcomes infrastructure complexity obstructing scalable experience data collection

Innovation

Methods, ideas, or system contributions that make the work stand out.

DreamGym synthesizes scalable experiences for RL training

Uses reasoning-based model for state transitions and feedback

Adaptively generates challenging tasks for curriculum learning

🔎 Similar Papers

No similar papers found.