Gaga: Group Any Gaussians via 3D-aware Memory Bank

📅 2024-04-11

🏛️ arXiv.org

📈 Citations: 10

✨ Influential: 1

career value

235K/year

🤖 AI Summary

This work addresses the challenge in open-world 3D scene reconstruction and segmentation where existing methods rely on continuous-view assumptions and struggle with inconsistency among zero-shot, class-agnostic 2D masks. We propose the first framework based on a 3D-aware memory bank, leveraging 3D Gaussian splatting representations. Through a cross-view spatial association mechanism, it performs instance-level alignment and consistency modeling of heterogeneous 2D segmentation outputs under sparse and arbitrary camera poses—fully eliminating the continuous-view constraint. Crucially, class-agnostic 2D masks are dynamically bound to 3D memory units, enabling robust 3D instance memory formation and cross-view propagation. On open-world 3D segmentation benchmarks, our method significantly outperforms state-of-the-art approaches, markedly improving mask consistency and scene understanding accuracy. This establishes a novel paradigm for real-scene 3D editing and semantic interpretation.

Technology Category

Application Category

📝 Abstract

We introduce Gaga, a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot class-agnostic segmentation models. Contrasted to prior 3D scene segmentation approaches that rely on video object tracking or contrastive learning methods, Gaga utilizes spatial information and effectively associates object masks across diverse camera poses through a novel 3D-aware memory bank. By eliminating the assumption of continuous view changes in training images, Gaga demonstrates robustness to variations in camera poses, particularly beneficial for sparsely sampled images, ensuring precise mask label consistency. Furthermore, Gaga accommodates 2D segmentation masks from diverse sources and demonstrates robust performance with different open-world zero-shot class-agnostic segmentation models, significantly enhancing its versatility. Extensive qualitative and quantitative evaluations demonstrate that Gaga performs favorably against state-of-the-art methods, emphasizing its potential for real-world applications such as 3D scene understanding and manipulation.

Problem

Research questions and friction points this paper is trying to address.

Reconstructs and segments open-world 3D scenes

Associates object masks across diverse camera poses

Enhances versatility with diverse 2D segmentation masks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 3D-aware memory bank for mask association

Leverages zero-shot class-agnostic segmentation models

Robust to sparse camera pose variations

🔎 Similar Papers

GaussianBlock: Building Part-Aware Compositional and Editable 3D Scene by Primitives and Gaussians