SAGOnline: Segment Any Gaussians Online

πŸ“… 2025-08-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Addressing the challenge of simultaneously achieving efficiency, 3D spatial reasoning, and multi-object tracking in 3D Gaussian Splatting (3DGS), this paper introduces the first lightweight zero-shot framework for real-time 3D segmentation and cross-view lossless multi-object tracking within Gaussian scenes. Methodologically, it decouples mask generation from 2D video foundation models (e.g., SAM2) and 3D Gaussian instance annotation: initial 2D segmentation is obtained via view-consistent mask propagation, followed by GPU-accelerated 3D mask reconstruction and explicit primitive-level Gaussian labeling to achieve full 3D instance segmentation. This framework enables, for the first time, fine-grained, primitive-level Gaussian annotation without any 3D training or adaptation. Evaluated on NVOS and Spin-NeRF, it achieves 92.7% and 95.2% mIoU, respectively, with only 27 ms per frame inferenceβ€”15Γ— to 1500Γ— faster than state-of-the-art methods.

Technology Category

Application Category

πŸ“ Abstract
3D Gaussian Splatting (3DGS) has emerged as a powerful paradigm for explicit 3D scene representation, yet achieving efficient and consistent 3D segmentation remains challenging. Current methods suffer from prohibitive computational costs, limited 3D spatial reasoning, and an inability to track multiple objects simultaneously. We present Segment Any Gaussians Online (SAGOnline), a lightweight and zero-shot framework for real-time 3D segmentation in Gaussian scenes that addresses these limitations through two key innovations: (1) a decoupled strategy that integrates video foundation models (e.g., SAM2) for view-consistent 2D mask propagation across synthesized views; and (2) a GPU-accelerated 3D mask generation and Gaussian-level instance labeling algorithm that assigns unique identifiers to 3D primitives, enabling lossless multi-object tracking and segmentation across views. SAGOnline achieves state-of-the-art performance on NVOS (92.7% mIoU) and Spin-NeRF (95.2% mIoU) benchmarks, outperforming Feature3DGS, OmniSeg3D-gs, and SA3D by 15--1500 times in inference speed (27 ms/frame). Qualitative results demonstrate robust multi-object segmentation and tracking in complex scenes. Our contributions include: (i) a lightweight and zero-shot framework for 3D segmentation in Gaussian scenes, (ii) explicit labeling of Gaussian primitives enabling simultaneous segmentation and tracking, and (iii) the effective adaptation of 2D video foundation models to the 3D domain. This work allows real-time rendering and 3D scene understanding, paving the way for practical AR/VR and robotic applications.
Problem

Research questions and friction points this paper is trying to address.

Efficient and consistent 3D segmentation in Gaussian scenes
High computational costs and limited 3D spatial reasoning
Inability to track multiple objects simultaneously in 3D
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled strategy with video foundation models
GPU-accelerated 3D mask generation algorithm
Explicit labeling of Gaussian primitives for tracking
πŸ”Ž Similar Papers
No similar papers found.