VirtualEnv: A Platform for Embodied AI Research

📅 2026-01-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing embodied intelligence evaluation platforms struggle to support fine-grained assessment of large language models (LLMs) in realistic, interactive environments. This work proposes a next-generation simulation platform built on Unreal Engine 5 that uniquely integrates gamified mechanics—such as escape-room scenarios and procedurally generated environments—with LLMs and vision-language models (VLMs), enabling agents to perform complex tasks like object manipulation, navigation, and multi-agent collaboration through natural language instructions. The platform offers a standardized, extensible multimodal evaluation framework that supports real-time environmental interaction and dynamic task generation. Comprehensive benchmarking of state-of-the-art LLMs across tasks of varying complexity quantifies their performance in adaptability, planning, and coordination. The platform has been open-sourced to advance research in embodied artificial intelligence.

Technology Category

Application Category

📝 Abstract
As large language models (LLMs) continue to improve in reasoning and decision-making, there is a growing need for realistic and interactive environments where their abilities can be rigorously evaluated. We present VirtualEnv, a next-generation simulation platform built on Unreal Engine 5 that enables fine-grained benchmarking of LLMs in embodied and interactive scenarios. VirtualEnv supports rich agent-environment interactions, including object manipulation, navigation, and adaptive multi-agent collaboration, as well as game-inspired mechanics like escape rooms and procedurally generated environments. We provide a user-friendly API built on top of Unreal Engine, allowing researchers to deploy and control LLM-driven agents using natural language instructions. We integrate large-scale LLMs and vision-language models (VLMs), such as GPT-based models, to generate novel environments and structured tasks from multimodal inputs. Our experiments benchmark the performance of several popular LLMs across tasks of increasing complexity, analyzing differences in adaptability, planning, and multi-agent coordination. We also describe our methodology for procedural task generation, task validation, and real-time environment control. VirtualEnv is released as an open-source platform, we aim to advance research at the intersection of AI and gaming, enable standardized evaluation of LLMs in embodied AI settings, and pave the way for future developments in immersive simulations and interactive entertainment.
Problem

Research questions and friction points this paper is trying to address.

embodied AI
large language models
interactive simulation
benchmarking
virtual environment
Innovation

Methods, ideas, or system contributions that make the work stand out.

embodied AI
Unreal Engine 5
procedural environment generation
LLM-driven agents
multimodal task generation
🔎 Similar Papers
No similar papers found.
K
Kabir Swain
Massachusetts Institute of Technology
S
Sijie Han
University of Toronto
A
Ayush Raina
Sony Interactive Entertainment
J
Jin Zhang
Sony Interactive Entertainment
Shuang Li
Shuang Li
Google DeepMind
Generative ModelingRobot LearningComputer VisionMachine Learning
M
Michael Stopa
Sony Interactive Entertainment
Antonio Torralba
Antonio Torralba
Professor of Computer Science, MIT
visioncomputer vision