Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning

📅 2026-02-10

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Current reinforcement learning agents are limited by the scarcity of diverse and reliable interactive environments. This work proposes the first large-scale, code-driven, database-backed framework for synthetic environment generation, automatically constructing 1,000 executable environments representing everyday scenarios, each integrating an average of 35 tools. By leveraging a database to enforce consistent and reliable state transitions, the framework enables efficient training of multi-turn tool-using agents with strong generalization capabilities—entirely without reliance on real-world or task-specific environments. Experimental results demonstrate that agents trained exclusively in these synthetic environments achieve strong out-of-distribution generalization across three benchmarks, validating the effectiveness and scalability of the proposed approach.

Technology Category

Application Category

📝 Abstract

Recent advances in large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments. However, scaling such agent training is limited by the lack of diverse and reliable environments. In this paper, we propose Agent World Model (AWM), a fully synthetic environment generation pipeline. Using this pipeline, we scale to 1,000 environments covering everyday scenarios, in which agents can interact with rich toolsets (35 tools per environment on average) and obtain high-quality observations. Notably, these environments are code-driven and backed by databases, providing more reliable and consistent state transitions than environments simulated by LLMs. Moreover, they enable more efficient agent interaction compared with collecting trajectories from realistic environments. To demonstrate the effectiveness of this resource, we perform large-scale reinforcement learning for multi-turn tool-use agents. Thanks to the fully executable environments and accessible database states, we can also design reliable reward functions. Experiments on three benchmarks show that training exclusively in synthetic environments, rather than benchmark-specific ones, yields strong out-of-distribution generalization. The code is available at https://github.com/Snowflake-Labs/agent-world-model.

Problem

Research questions and friction points this paper is trying to address.

synthetic environments

agentic reinforcement learning

environment scalability

tool-use agents

state consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic environments

agent world model

agentic reinforcement learning