Consistent Instance Field for Dynamic Scene Understanding

📅 2025-12-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inconsistent instance representations in dynamic scene understanding—caused by discrete tracking and viewpoint-dependent features—this paper proposes a continuous, probabilistic spatiotemporal instance representation framework. Methodologically, it introduces (1) deformable 3D Gaussians as instance embeddings, explicitly decoupling object identity from visibility; (2) Gaussian identity calibration and semantic-driven resampling to ensure cross-view and temporal instance consistency; and (3) an end-to-end trainable architecture integrating differentiable rasterization, occupancy-probability modeling, and conditional instance distribution learning, operating directly on RGB images and instance masks. Evaluated on HyperNeRF and Neu3D benchmarks, the method achieves significant improvements over state-of-the-art approaches. It further demonstrates superior performance on novel-view panoptic segmentation and open-vocabulary 4D querying tasks, establishing new benchmarks for continuous 4D instance-aware scene understanding.

Technology Category

Application Category

📝 Abstract
We introduce Consistent Instance Field, a continuous and probabilistic spatio-temporal representation for dynamic scene understanding. Unlike prior methods that rely on discrete tracking or view-dependent features, our approach disentangles visibility from persistent object identity by modeling each space-time point with an occupancy probability and a conditional instance distribution. To realize this, we introduce a novel instance-embedded representation based on deformable 3D Gaussians, which jointly encode radiance and semantic information and are learned directly from input RGB images and instance masks through differentiable rasterization. Furthermore, we introduce new mechanisms to calibrate per-Gaussian identities and resample Gaussians toward semantically active regions, ensuring consistent instance representations across space and time. Experiments on HyperNeRF and Neu3D datasets demonstrate that our method significantly outperforms state-of-the-art methods on novel-view panoptic segmentation and open-vocabulary 4D querying tasks.
Problem

Research questions and friction points this paper is trying to address.

Develops a continuous spatio-temporal representation for dynamic scenes
Disentangles visibility from object identity using occupancy and instance distributions
Enables consistent instance recognition across views and time for segmentation and querying
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous probabilistic spatio-temporal representation for dynamic scenes
Instance-embedded deformable 3D Gaussians encoding radiance and semantics
Calibration and resampling mechanisms for consistent instance identity
🔎 Similar Papers
No similar papers found.