π€ AI Summary
This work addresses the scarcity of diverse, automatically constructible interactive 3D training environments for embodied agents by introducing the first open-source platform that enables co-evolution between environment generation and embodied learning. Built on Unreal Engine 5, the platform integrates large language and vision models, engine-level scripting, Gym-compatible interfaces, physics validation, and vision-language model (VLM) feedback. It features SimCoder, a tool- and skill-augmented coding agent that autonomously generates physically plausible, executable 3D environments from textual or visual instructions and dynamically presents challenging tasks near the agentβs current capability boundary. Experiments demonstrate that this self-evolving framework substantially improves environment generation reliability, boosting agent success rates on unseen navigation benchmarks by 18% over fixed environments and by 40% compared to untrained baselines.
π Abstract
LLM/VLM-based digital agents have advanced rapidly thanks to scalable sandboxes for coding, web navigation, and computer use, which provide rich interactive training grounds. In contrast, embodied agents still lack abundant, diverse, and automatically generated 3D environments for interactive learning. Existing embodied simulators rely on manually crafted scenes or procedural templates, while recent LLM-based 3D generation systems mainly produce static scenes rather than deployable environments with verifiable tasks and standard learning interfaces. We introduce SimWorld Studio, an open-source platform built on Unreal Engine 5 for generating evolving embodied learning environments. At its core is SimCoder, a tool/skill-augmented coding agent that writes and executes engine-level code to construct physically grounded 3D worlds from language/image instructions. SimCoder self-evolves by using verifier feedback (e.g., compilation errors, physics checks, VLM critiques) to revise environments and autonomously add reusable tools and skills to its library. Generated worlds are exported as Gym-style environments for embodied agent learning. SimWorld Studio further enables co-evolution between environment generation and embodied learning: agent performance feedback guides SimCoder to generate adaptive curricula near the learner's capability frontier, so that environments become increasingly challenging as the embodied agent improves. Three case studies on embodied navigation show that self-evolution improves generation reliability, generated environments substantially improve embodied agent performance that generalizes to unseen benchmarks, and co-evolution yields an 18-point success-rate gain over fixed-environment learning and a 40-point gain over an untrained agent.