Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration

📅 2025-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the low language-geometry alignment efficiency and reliance on rendering and per-scene fine-tuning in open-vocabulary 3D scene understanding. We propose a rendering-free, end-to-end method that directly maps natural language to 3D Gaussians. Our approach builds upon 3D Gaussian Splatting and achieves fine-grained spatial alignment via pixel-level ray-Gaussian intersection mapping. Key contributions include: (1) a novel language-feature injection mechanism that directly binds CLIP text embeddings to 3D Gaussian ellipsoid parameters—bypassing conventional rendering pipelines; and (2) a cross-scene universal product quantization (PQ) scheme for embedding compression, eliminating the need for per-scene adaptation. Evaluated on open-vocabulary 3D semantic segmentation, object localization, and interactive selection tasks, our method significantly outperforms prior art while offering high efficiency, strong generalization across unseen scenes, and full end-to-end trainability.

Technology Category

Application Category

📝 Abstract
We introduce Dr. Splat, a novel approach for open-vocabulary 3D scene understanding leveraging 3D Gaussian Splatting. Unlike existing language-embedded 3DGS methods, which rely on a rendering process, our method directly associates language-aligned CLIP embeddings with 3D Gaussians for holistic 3D scene understanding. The key of our method is a language feature registration technique where CLIP embeddings are assigned to the dominant Gaussians intersected by each pixel-ray. Moreover, we integrate Product Quantization (PQ) trained on general large-scale image data to compactly represent embeddings without per-scene optimization. Experiments demonstrate that our approach significantly outperforms existing approaches in 3D perception benchmarks, such as open-vocabulary 3D semantic segmentation, 3D object localization, and 3D object selection tasks. For video results, please visit : https://drsplat.github.io/
Problem

Research questions and friction points this paper is trying to address.

Open-vocabulary 3D scene understanding
Direct language embedding registration
3D Gaussian Splatting enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Direct CLIP embedding association
Language feature registration technique
Product Quantization integration
🔎 Similar Papers
No similar papers found.