Tackling View-Dependent Semantics in 3D Language Gaussian Splatting

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

To address semantic misalignment in language-driven open-vocabulary segmentation for 3D Gaussian Splatting (3D-GS)—caused by neglecting view-dependent semantics—this paper proposes Language-Gaussian (LaGa), the first framework to formally model and exploit such semantics. LaGa achieves cross-view semantic alignment and aggregation via object-level scene decomposition, clustering of multi-view semantic descriptors, and a view-aware dynamic reweighting mechanism, overcoming the limitations of direct 2D feature projection. The method tightly integrates 3D-GS rendering, open-vocabulary language embeddings, and geometry-aware semantic modeling. Evaluated on the LERF-OVS benchmark, LaGa achieves an 18.7% improvement in mean Intersection-over-Union (mIoU) over prior state-of-the-art methods. The implementation is publicly released.

Technology Category

Application Category

📝 Abstract

Recent advancements in 3D Gaussian Splatting (3D-GS) enable high-quality 3D scene reconstruction from RGB images. Many studies extend this paradigm for language-driven open-vocabulary scene understanding. However, most of them simply project 2D semantic features onto 3D Gaussians and overlook a fundamental gap between 2D and 3D understanding: a 3D object may exhibit various semantics from different viewpoints--a phenomenon we term view-dependent semantics. To address this challenge, we propose LaGa (Language Gaussians), which establishes cross-view semantic connections by decomposing the 3D scene into objects. Then, it constructs view-aggregated semantic representations by clustering semantic descriptors and reweighting them based on multi-view semantics. Extensive experiments demonstrate that LaGa effectively captures key information from view-dependent semantics, enabling a more comprehensive understanding of 3D scenes. Notably, under the same settings, LaGa achieves a significant improvement of +18.7% mIoU over the previous SOTA on the LERF-OVS dataset. Our code is available at: https://github.com/SJTU-DeepVisionLab/LaGa.

Problem

Research questions and friction points this paper is trying to address.

Addressing view-dependent semantics in 3D language understanding

Bridging 2D and 3D semantic gaps in scene reconstruction

Improving multi-view semantic aggregation for 3D objects

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes 3D scene into objects

Clusters semantic descriptors for aggregation

Reweights semantics based on multi-view

🔎 Similar Papers

CLIP-GS: CLIP-Informed Gaussian Splatting for Real-time and View-consistent 3D Semantic Understanding