STORM: Spatio-Temporal Reconstruction Model for Large-Scale Outdoor Scenes

📅 2024-12-31
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of high computational cost, poor generalization, and quality degradation caused by noisy dynamic pseudo-labels in large-scale dynamic outdoor scene reconstruction, this paper proposes a single-pass forward dynamic reconstruction method based on self-supervised scene flow. We introduce a novel cross-frame Gaussian aggregation mechanism, integrating 3D Gaussian parameterization, spatiotemporal velocity modeling, and a Transformer architecture to achieve full-view, arbitrary-time amodal reconstruction. Crucially, our method automatically disentangles dynamic objects and generates high-fidelity segmentation masks without requiring instance-level annotations. Experiments demonstrate significant improvements: PSNR in dynamic regions increases by 4.3–6.6 dB over optimization-based methods and by 2.1–4.7 dB over feedforward approaches; 3D end-point error (EPE) decreases by 0.422 m, while Acc5 improves by 28.02%; reconstruction latency is only 200 ms, enabling real-time rendering.

Technology Category

Application Category

📝 Abstract
We present STORM, a spatio-temporal reconstruction model designed for reconstructing dynamic outdoor scenes from sparse observations. Existing dynamic reconstruction methods often rely on per-scene optimization, dense observations across space and time, and strong motion supervision, resulting in lengthy optimization times, limited generalization to novel views or scenes, and degenerated quality caused by noisy pseudo-labels for dynamics. To address these challenges, STORM leverages a data-driven Transformer architecture that directly infers dynamic 3D scene representations--parameterized by 3D Gaussians and their velocities--in a single forward pass. Our key design is to aggregate 3D Gaussians from all frames using self-supervised scene flows, transforming them to the target timestep to enable complete (i.e.,"amodal") reconstructions from arbitrary viewpoints at any moment in time. As an emergent property, STORM automatically captures dynamic instances and generates high-quality masks using only reconstruction losses. Extensive experiments on public datasets show that STORM achieves precise dynamic scene reconstruction, surpassing state-of-the-art per-scene optimization methods (+4.3 to 6.6 PSNR) and existing feed-forward approaches (+2.1 to 4.7 PSNR) in dynamic regions. STORM reconstructs large-scale outdoor scenes in 200ms, supports real-time rendering, and outperforms competitors in scene flow estimation, improving 3D EPE by 0.422m and Acc5 by 28.02%. Beyond reconstruction, we showcase four additional applications of our model, illustrating the potential of self-supervised learning for broader dynamic scene understanding.
Problem

Research questions and friction points this paper is trying to address.

Dynamic Scene Reconstruction
Large-Scale Outdoor Scenes
Temporal Consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-driven Transformer
Dynamic Scene Reconstruction
Real-time Rendering
🔎 Similar Papers
No similar papers found.