Semantic Consistent Language Gaussian Splatting for Point-Level Open-vocabulary Querying

📅 2025-03-27

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

To address the challenge of point-level semantic localization for open-vocabulary queries in 3D Gaussian representations, this paper proposes the first two-stage query framework that constructs semantically consistent distilled ground truth using SAM2 masklets: (1) coarse-grained semantic region retrieval via language-vision alignment, followed by (2) precise localization of individual Gaussian ellipsoids. Our method integrates Segment Anything Model 2 (SAM2), cross-modal distillation, and 3D Gaussian Splatting to enable text-driven, point-level semantic selection over 3D Gaussians—previously unattainable. Evaluated on three benchmarks including 3D-OVS, our approach significantly outperforms state-of-the-art methods, achieving a +20.42% improvement in mean Intersection-over-Union (mIoU). This demonstrates the effectiveness of both our semantic ground-truth construction strategy and the staged retrieval paradigm.

Technology Category

Application Category

📝 Abstract

Open-vocabulary querying in 3D Gaussian Splatting aims to identify semantically relevant regions within a 3D Gaussian representation based on a given text query. Prior work, such as LangSplat, addressed this task by retrieving these regions in the form of segmentation masks on 2D renderings. More recently, OpenGaussian introduced point-level querying, which directly selects a subset of 3D Gaussians. In this work, we propose a point-level querying method that builds upon LangSplat's framework. Our approach improves the framework in two key ways: (a) we leverage masklets from the Segment Anything Model 2 (SAM2) to establish semantic consistent ground-truth for distilling the language Gaussians; (b) we introduces a novel two-step querying approach that first retrieves the distilled ground-truth and subsequently uses the ground-truth to query the individual Gaussians. Experimental evaluations on three benchmark datasets demonstrate that the proposed method achieves better performance compared to state-of-the-art approaches. For instance, our method achieves an mIoU improvement of +20.42 on the 3D-OVS dataset.

Problem

Research questions and friction points this paper is trying to address.

Enables point-level open-vocabulary querying in 3D Gaussian Splatting

Improves semantic consistency using SAM2 masklets for distillation

Introduces two-step querying for better Gaussian retrieval accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses SAM2 masklets for semantic consistency

Introduces two-step querying approach

Improves mIoU by 20.42 on 3D-OVS

🔎 Similar Papers

No similar papers found.