WonderTurbo: Generating Interactive 3D World in 0.72 Seconds

📅 2025-04-03

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

To address the challenge that existing 3D generation methods struggle to meet real-time interactive requirements, this paper introduces the first end-to-end real-time interactive 3D scene generation framework. Our method comprises three core innovations: (1) StepSplat—a dynamic geometric representation enabling fast, differentiable geometry modeling; (2) QuickDepth—a lightweight depth completion module enhancing robustness to sparse input; and (3) FastPaint—a two-stage diffusion-driven instant inpainting mechanism that jointly optimizes neural rendering and spatial consistency. Evaluated on a single consumer-grade GPU, the framework achieves an end-to-end generation latency of 0.72 seconds per frame (0.26 s for geometry + 0.46 s for appearance), accelerating baseline methods by 15× while preserving high visual fidelity and cross-view spatial coherence. To our knowledge, this is the first work to enable millisecond-level responsive interactive 3D world generation.

Technology Category

Application Category

📝 Abstract

Interactive 3D generation is gaining momentum and capturing extensive attention for its potential to create immersive virtual experiences. However, a critical challenge in current 3D generation technologies lies in achieving real-time interactivity. To address this issue, we introduce WonderTurbo, the first real-time interactive 3D scene generation framework capable of generating novel perspectives of 3D scenes within 0.72 seconds. Specifically, WonderTurbo accelerates both geometric and appearance modeling in 3D scene generation. In terms of geometry, we propose StepSplat, an innovative method that constructs efficient 3D geometric representations through dynamic updates, each taking only 0.26 seconds. Additionally, we design QuickDepth, a lightweight depth completion module that provides consistent depth input for StepSplat, further enhancing geometric accuracy. For appearance modeling, we develop FastPaint, a 2-steps diffusion model tailored for instant inpainting, which focuses on maintaining spatial appearance consistency. Experimental results demonstrate that WonderTurbo achieves a remarkable 15X speedup compared to baseline methods, while preserving excellent spatial consistency and delivering high-quality output.

Problem

Research questions and friction points this paper is trying to address.

Achieving real-time interactivity in 3D generation

Accelerating geometric and appearance modeling in 3D scenes

Maintaining spatial consistency while improving generation speed

Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-time 3D scene generation in 0.72 seconds

StepSplat for dynamic geometric updates in 0.26 seconds

FastPaint diffusion model for instant appearance inpainting

🔎 Similar Papers

WonderWorld: Interactive 3D Scene Generation from a Single Image