4D Primitive-Mâché: Glueing Primitives for Persistent 4D Scene Reconstruction

📅 2025-12-18

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This paper addresses complete and persistent 4D scene reconstruction from monocular RGB video. We propose a novel method that decomposes dynamic scenes into rigid 3D primitives and jointly optimizes their temporally consistent rigid-body motions. Our key contributions are: (1) a primitive stitching mechanism that enforces cross-frame geometric and motion consistency; and (2) an occlusion-aware motion extrapolation strategy enabling permanent object modeling and fully replayable temporal reconstruction. The framework integrates dense 2D correspondence estimation, motion clustering, primitive decomposition, and temporal consistency constraints. Evaluated on multi-object scanning and dynamic scene datasets, our approach achieves state-of-the-art performance in reconstruction completeness, geometric accuracy, and temporal replayability—outperforming existing methods across all three metrics.

Technology Category

Application Category

📝 Abstract

We present a dynamic reconstruction system that receives a casual monocular RGB video as input, and outputs a complete and persistent reconstruction of the scene. In other words, we reconstruct not only the the currently visible parts of the scene, but also all previously viewed parts, which enables replaying the complete reconstruction across all timesteps. Our method decomposes the scene into a set of rigid 3D primitives, which are assumed to be moving throughout the scene. Using estimated dense 2D correspondences, we jointly infer the rigid motion of these primitives through an optimisation pipeline, yielding a 4D reconstruction of the scene, i.e. providing 3D geometry dynamically moving through time. To achieve this, we also introduce a mechanism to extrapolate motion for objects that become invisible, employing motion-grouping techniques to maintain continuity. The resulting system enables 4D spatio-temporal awareness, offering capabilities such as replayable 3D reconstructions of articulated objects through time, multi-object scanning, and object permanence. On object scanning and multi-object datasets, our system significantly outperforms existing methods both quantitatively and qualitatively.

Problem

Research questions and friction points this paper is trying to address.

Reconstructs complete 4D scenes from monocular videos

Tracks rigid primitives over time for persistent reconstruction

Handles object invisibility with motion extrapolation and grouping

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes scene into rigid 3D primitives

Jointly infers rigid motion via optimization pipeline

Extrapolates motion for invisible objects using grouping

🔎 Similar Papers

GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians