Visibility-Aware Language Aggregation for Open-Vocabulary Segmentation in 3D Gaussian Splatting

📅 2025-09-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing two key challenges in open-vocabulary semantic segmentation for 3D Gaussian Splatting—(i) language feature contamination caused by redundant background Gaussians and (ii) multi-view inconsistency induced by view-specific noise—this paper proposes a visibility-aware language feature fusion method. Our approach features: (1) a ray-visibility-based gating mechanism that dynamically suppresses linguistic responses from low-contribution Gaussians, and (2) streaming weighted geometric median fusion in cosine space to enhance cross-view consistency of language features. The method is lightweight and training-free, requiring no auxiliary networks or additional supervision. Evaluated on multiple open-vocabulary localization and segmentation benchmarks, it significantly outperforms existing state-of-the-art methods, achieving superior accuracy, robustness against viewpoint and occlusion variations, and real-time inference speed.

Technology Category

Application Category

📝 Abstract
Recently, distilling open-vocabulary language features from 2D images into 3D Gaussians has attracted significant attention. Although existing methods achieve impressive language-based interactions of 3D scenes, we observe two fundamental issues: background Gaussians contributing negligibly to a rendered pixel get the same feature as the dominant foreground ones, and multi-view inconsistencies due to view-specific noise in language embeddings. We introduce Visibility-Aware Language Aggregation (VALA), a lightweight yet effective method that computes marginal contributions for each ray and applies a visibility-aware gate to retain only visible Gaussians. Moreover, we propose a streaming weighted geometric median in cosine space to merge noisy multi-view features. Our method yields a robust, view-consistent language feature embedding in a fast and memory-efficient manner. VALA improves open-vocabulary localization and segmentation across reference datasets, consistently surpassing existing works.
Problem

Research questions and friction points this paper is trying to address.

Addresses background Gaussian noise in 3D segmentation
Solves multi-view inconsistency in language embeddings
Improves open-vocabulary localization and segmentation accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visibility-aware gate retains visible Gaussians
Streaming weighted geometric median merges features
Robust view-consistent language embedding efficiently
🔎 Similar Papers
No similar papers found.
S
Sen Wang
Technical University of Munich
K
Kunyi Li
Technical University of Munich
S
Siyun Liang
Technical University of Munich
E
Elena Alegret
Technical University of Munich
J
Jing Ma
Ludwig Maximilian University of Munich
Nassir Navab
Nassir Navab
Professor of Computer Science, Technische Universität München
Stefano Gasperini
Stefano Gasperini
Postdoc at Technical University of Munich (TUM)
computer visiondeep learningautonomous driving