MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator

📅 2025-10-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current text-to-video models are limited to 2D representations and exhibit weak interactivity, failing to support the spatiotemporal environmental modeling required for robotic applications. This paper introduces the first language-guided editable 4D world simulator, integrating text-to-video generation, neural radiance fields (NeRF), and multi-view consistency optimization to enable object-level manipulation, trajectory-guided video synthesis, and feature-field distillation. Our key contribution is the first realization of real-time, language-instructed scene editing—without re-synthesis—while preserving dynamic consistency across viewpoints. Experiments demonstrate that the system achieves high visual fidelity while significantly improving spatiotemporal controllability, editing efficiency, and suitability for robot simulation tasks.

Technology Category

Application Category

📝 Abstract
World models that support controllable and editable spatiotemporal environments are valuable for robotics, enabling scalable training data, repro ducible evaluation, and flexible task design. While recent text-to-video models generate realistic dynam ics, they are constrained to 2D views and offer limited interaction. We introduce MorphoSim, a language guided framework that generates 4D scenes with multi-view consistency and object-level controls. From natural language instructions, MorphoSim produces dynamic environments where objects can be directed, recolored, or removed, and scenes can be observed from arbitrary viewpoints. The framework integrates trajectory-guided generation with feature field dis tillation, allowing edits to be applied interactively without full re-generation. Experiments show that Mor phoSim maintains high scene fidelity while enabling controllability and editability. The code is available at https://github.com/eric-ai-lab/Morph4D.
Problem

Research questions and friction points this paper is trying to address.

Generating 4D scenes with multi-view consistency and object-level controls
Enabling interactive object manipulation without full scene re-generation
Creating controllable and editable spatiotemporal environments for robotics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates 4D scenes with multi-view consistency
Enables object-level controls via language instructions
Integrates trajectory-guided generation with feature distillation
🔎 Similar Papers
No similar papers found.