🤖 AI Summary
This work addresses the challenges of object-level segmentation in dynamic 4D Gaussian scenes, where complex motion, occlusions, and ambiguous boundaries hinder accurate delineation. The authors propose a learning-free 4D Gaussian segmentation framework that employs a two-stage iterative optimization strategy. First, Gaussian instance tracking over variable-length temporal segments enforces identity consistency and enhances structural integrity; second, per-frame control of Gaussian rendering ranges refines object boundaries. By integrating Iterative Gaussian Instance Tracking (IGIT), Rendering-range Control (RCC), and a temporal segmentation merging strategy, the method significantly improves sensitivity to dynamic changes. Experiments on the HyperNeRF and Neu3D datasets demonstrate that the resulting object-wise Gaussian point clouds exhibit sharper boundaries and more complete structures, achieving superior accuracy and efficiency compared to state-of-the-art approaches.
📝 Abstract
Object-level segmentation in dynamic 4D Gaussian scenes remains challenging due to complex motion, occlusions, and ambiguous boundaries. In this paper, we present an efficient learning-free 4D Gaussian segmentation framework that lifts video segmentation masks to 4D spaces, whose core is a two-stage iterative boundary refinement, TIBR4D. The first stage is an Iterative Gaussian Instance Tracing (IGIT) at the temporal segment level. It progressively refines Gaussian-to-instance probabilities through iterative tracing, and extracts corresponding Gaussian point clouds that better handle occlusions and preserve completeness of object structures compared to existing one-shot threshold-based methods. The second stage is a frame-wise Gaussian Rendering Range Control (RCC) via suppressing highly uncertain Gaussians near object boundaries while retaining their core contributions for more accurate boundaries. Furthermore, a temporal segmentation merging strategy is proposed for IGIT to balance identity consistency and dynamic awareness. Longer segments enforce stronger multi-frame constraints for stable identities, while shorter segments allow identity changes to be captured promptly. Experiments on HyperNeRF and Neu3D demonstrate that our method produces accurate object Gaussian point clouds with clearer boundaries and higher efficiency compared to SOTA methods.