🤖 AI Summary
Existing simulation platforms suffer from limited environmental diversity, insufficient fidelity in physical and social rule modeling, and inadequate native support for LLM/VLM agents. Method: We propose a high-fidelity open-world simulation platform built on Unreal Engine 5, featuring a novel language-driven procedural world generation mechanism, integrated high-accuracy physics simulation and social dynamics modeling, multimodal perception, open-vocabulary action execution, and a hierarchical abstract action space. The platform supports customizable multi-agent cooperative/competitive scenarios and is compatible with mainstream models including GPT-4o, Gemini, Claude, and DeepSeek. Contribution/Results: Deployed on long-horizon delivery tasks, the platform reveals significant behavioral disparities across models in strategic reasoning, social interaction, and environmental adaptation. It establishes the first unified simulation foundation that simultaneously achieves high realism and scalability for training, evaluating, and real-world transfer of LLM/VLM agents.
📝 Abstract
While LLM/VLM-powered AI agents have advanced rapidly in math, coding, and computer use, their applications in complex physical and social environments remain challenging. Building agents that can survive and thrive in the real world (for example, by autonomously earning income or running a business) requires massive-scale interaction, reasoning, training, and evaluation across diverse embodied scenarios. However, existing world simulators for such development fall short: they often rely on limited hand-crafted environments, simulate simplified game-like physics and social rules, and lack native support for LLM/VLM agents. We introduce SimWorld, a new simulator built on Unreal Engine 5, designed for developing and evaluating LLM/VLM agents in rich, real-world-like settings. SimWorld offers three core capabilities: (1) realistic, open-ended world simulation, including accurate physical and social dynamics and language-driven procedural environment generation; (2) a rich interface for LLM/VLM agents, with multimodal world inputs and open-vocabulary actions at varying levels of abstraction; and (3) diverse and extensible physical and social reasoning scenarios that are easily customizable by users. We demonstrate SimWorld by deploying frontier LLM agents (e.g., GPT-4o, Gemini-2.5-Flash, Claude-3.5, and DeepSeek-Prover-V2) on long-horizon multi-agent delivery tasks involving strategic cooperation and competition. The results reveal distinct reasoning patterns and limitations across models. We open-source SimWorld and hope it becomes a foundational platform for advancing real-world agent intelligence across disciplines: https://simworld.org.