🤖 AI Summary
To address the challenge that existing 3D generation methods struggle to meet real-time interactive requirements, this paper introduces the first end-to-end real-time interactive 3D scene generation framework. Our method comprises three core innovations: (1) StepSplat—a dynamic geometric representation enabling fast, differentiable geometry modeling; (2) QuickDepth—a lightweight depth completion module enhancing robustness to sparse input; and (3) FastPaint—a two-stage diffusion-driven instant inpainting mechanism that jointly optimizes neural rendering and spatial consistency. Evaluated on a single consumer-grade GPU, the framework achieves an end-to-end generation latency of 0.72 seconds per frame (0.26 s for geometry + 0.46 s for appearance), accelerating baseline methods by 15× while preserving high visual fidelity and cross-view spatial coherence. To our knowledge, this is the first work to enable millisecond-level responsive interactive 3D world generation.
📝 Abstract
Interactive 3D generation is gaining momentum and capturing extensive attention for its potential to create immersive virtual experiences. However, a critical challenge in current 3D generation technologies lies in achieving real-time interactivity. To address this issue, we introduce WonderTurbo, the first real-time interactive 3D scene generation framework capable of generating novel perspectives of 3D scenes within 0.72 seconds. Specifically, WonderTurbo accelerates both geometric and appearance modeling in 3D scene generation. In terms of geometry, we propose StepSplat, an innovative method that constructs efficient 3D geometric representations through dynamic updates, each taking only 0.26 seconds. Additionally, we design QuickDepth, a lightweight depth completion module that provides consistent depth input for StepSplat, further enhancing geometric accuracy. For appearance modeling, we develop FastPaint, a 2-steps diffusion model tailored for instant inpainting, which focuses on maintaining spatial appearance consistency. Experimental results demonstrate that WonderTurbo achieves a remarkable 15X speedup compared to baseline methods, while preserving excellent spatial consistency and delivering high-quality output.