LychSim: A Controllable and Interactive Simulation Framework for Vision Research

πŸ“… 2026-05-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

211K/year
πŸ€– AI Summary
Existing visual simulation platforms suffer from high technical barriers and lack user-friendly, controllable, and interactive environments for non-graphics experts, hindering efficient synthetic data generation, out-of-distribution (OOD) evaluation, and closed-loop agent testing. To address this, we propose LychSimβ€”the first Unreal Engine 5-based simulation framework natively integrating the Model Context Protocol (MCP). LychSim enables language-driven dynamic scene editing and precise pose control through a lightweight Python API, procedural high-fidelity scene generation, and semantically aligned 3D annotations. This framework substantially lowers the entry barrier and has been successfully applied to synthetic data engines, adversarial evaluation in reinforcement learning, and language-guided layout generation. The code and annotated datasets will be open-sourced to foster community advancement.
πŸ“ Abstract
While self-supervised pretraining has reduced vision systems' reliance on synthetic data, simulation remains an indispensable tool for closed-loop optimization and rigorous out-of-distribution (OOD) evaluation. However, modern simulation platforms often present steep technical barriers, requiring extensive expertise in computer graphics and game development. In this work, we present LychSim, a highly controllable and interactive simulation framework built upon Unreal Engine 5 to bridge this gap. LychSim is built around three key designs: (1) a streamlined Python API that abstracts away underlying engine complexities; (2) a procedural data pipeline capable of generating diverse, high-fidelity environments with varying out-of-distribution (OOD) visual challenges, paired with rich 2D and 3D ground truths; and (3) a native integration of the Model Context Protocol (MCP) that transforms the simulator into a dynamic, closed-loop playground for reasoning agentic LLMs. We further annotate scene-level procedural rules and object-level pose alignments to enable semantically aligned 3D ground truths and automated scene modification. We demonstrate LychSim's capability across multiple downstream applications, including serving as a synthetic data engine, powering reinforcement learning-based adversarial examiners, and facilitating interactive, language-driven scene layout generation. To benefit the broader vision community, LychSim will be made publicly available, including full source code and various data annotations.
Problem

Research questions and friction points this paper is trying to address.

simulation
vision research
out-of-distribution evaluation
technical barrier
synthetic data
Innovation

Methods, ideas, or system contributions that make the work stand out.

controllable simulation
procedural data generation
out-of-distribution evaluation
Model Context Protocol
interactive LLM integration
πŸ”Ž Similar Papers
No similar papers found.