3D4D: An Interactive, Editable, 4D World Model via 3D Video Generation

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of generating interactive 4D (3D + time) scenes from a single static image and text prompts, enabling user-driven real-time visual exploration. We propose a lightweight, web-native editable 4D world model comprising four core components: multimodal input fusion, 4D scene generation, interactive editing, and foveated rendering guided by gaze estimation. Innovatively integrating WebGL with Supersplat for efficient rendering, our framework combines 3D video generation and eye-movement-aware rendering to achieve low-latency, high-fidelity 4D dynamic visualization directly in the browser. Experiments demonstrate significant improvements in temporal coherence, editing responsiveness, and perceptual immersion. To our knowledge, this is the first end-to-end approach that constructs fully interactive 4D environments from a single image and text prompt. The implementation—including source code and an online demo—is publicly released.

Technology Category

Application Category

📝 Abstract
We introduce 3D4D, an interactive 4D visualization framework that integrates WebGL with Supersplat rendering. It transforms static images and text into coherent 4D scenes through four core modules and employs a foveated rendering strategy for efficient, real-time multi-modal interaction. This framework enables adaptive, user-driven exploration of complex 4D environments. The project page and code are available at https://yunhonghe1021.github.io/NOVA/.
Problem

Research questions and friction points this paper is trying to address.

Transforms static images and text into interactive 4D scenes
Enables real-time multi-modal interaction in 4D environments
Provides adaptive user-driven exploration of complex 4D worlds
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates WebGL with Supersplat rendering technology
Transforms static images into coherent 4D scenes
Uses foveated rendering for real-time interaction
🔎 Similar Papers
No similar papers found.