SAGOnline: Segment Any Gaussians Online

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Addressing the challenge of simultaneously achieving efficiency, 3D spatial reasoning, and multi-object tracking in 3D Gaussian Splatting (3DGS), this paper introduces the first lightweight zero-shot framework for real-time 3D segmentation and cross-view lossless multi-object tracking within Gaussian scenes. Methodologically, it decouples mask generation from 2D video foundation models (e.g., SAM2) and 3D Gaussian instance annotation: initial 2D segmentation is obtained via view-consistent mask propagation, followed by GPU-accelerated 3D mask reconstruction and explicit primitive-level Gaussian labeling to achieve full 3D instance segmentation. This framework enables, for the first time, fine-grained, primitive-level Gaussian annotation without any 3D training or adaptation. Evaluated on NVOS and Spin-NeRF, it achieves 92.7% and 95.2% mIoU, respectively, with only 27 ms per frame inference—15× to 1500× faster than state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

3D Gaussian Splatting (3DGS) has emerged as a powerful paradigm for explicit 3D scene representation, yet achieving efficient and consistent 3D segmentation remains challenging. Current methods suffer from prohibitive computational costs, limited 3D spatial reasoning, and an inability to track multiple objects simultaneously. We present Segment Any Gaussians Online (SAGOnline), a lightweight and zero-shot framework for real-time 3D segmentation in Gaussian scenes that addresses these limitations through two key innovations: (1) a decoupled strategy that integrates video foundation models (e.g., SAM2) for view-consistent 2D mask propagation across synthesized views; and (2) a GPU-accelerated 3D mask generation and Gaussian-level instance labeling algorithm that assigns unique identifiers to 3D primitives, enabling lossless multi-object tracking and segmentation across views. SAGOnline achieves state-of-the-art performance on NVOS (92.7% mIoU) and Spin-NeRF (95.2% mIoU) benchmarks, outperforming Feature3DGS, OmniSeg3D-gs, and SA3D by 15--1500 times in inference speed (27 ms/frame). Qualitative results demonstrate robust multi-object segmentation and tracking in complex scenes. Our contributions include: (i) a lightweight and zero-shot framework for 3D segmentation in Gaussian scenes, (ii) explicit labeling of Gaussian primitives enabling simultaneous segmentation and tracking, and (iii) the effective adaptation of 2D video foundation models to the 3D domain. This work allows real-time rendering and 3D scene understanding, paving the way for practical AR/VR and robotic applications.

Problem

Research questions and friction points this paper is trying to address.

Efficient and consistent 3D segmentation in Gaussian scenes

High computational costs and limited 3D spatial reasoning

Inability to track multiple objects simultaneously in 3D

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled strategy with video foundation models

GPU-accelerated 3D mask generation algorithm

Explicit labeling of Gaussian primitives for tracking

🔎 Similar Papers

Segment Any 3D Gaussians