Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration

📅 2025-02-23

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the low language-geometry alignment efficiency and reliance on rendering and per-scene fine-tuning in open-vocabulary 3D scene understanding. We propose a rendering-free, end-to-end method that directly maps natural language to 3D Gaussians. Our approach builds upon 3D Gaussian Splatting and achieves fine-grained spatial alignment via pixel-level ray-Gaussian intersection mapping. Key contributions include: (1) a novel language-feature injection mechanism that directly binds CLIP text embeddings to 3D Gaussian ellipsoid parameters—bypassing conventional rendering pipelines; and (2) a cross-scene universal product quantization (PQ) scheme for embedding compression, eliminating the need for per-scene adaptation. Evaluated on open-vocabulary 3D semantic segmentation, object localization, and interactive selection tasks, our method significantly outperforms prior art while offering high efficiency, strong generalization across unseen scenes, and full end-to-end trainability.

Technology Category

Application Category

📝 Abstract

We introduce Dr. Splat, a novel approach for open-vocabulary 3D scene understanding leveraging 3D Gaussian Splatting. Unlike existing language-embedded 3DGS methods, which rely on a rendering process, our method directly associates language-aligned CLIP embeddings with 3D Gaussians for holistic 3D scene understanding. The key of our method is a language feature registration technique where CLIP embeddings are assigned to the dominant Gaussians intersected by each pixel-ray. Moreover, we integrate Product Quantization (PQ) trained on general large-scale image data to compactly represent embeddings without per-scene optimization. Experiments demonstrate that our approach significantly outperforms existing approaches in 3D perception benchmarks, such as open-vocabulary 3D semantic segmentation, 3D object localization, and 3D object selection tasks. For video results, please visit : https://drsplat.github.io/

Problem

Research questions and friction points this paper is trying to address.

Open-vocabulary 3D scene understanding

Direct language embedding registration

3D Gaussian Splatting enhancement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Direct CLIP embedding association

Language feature registration technique

Product Quantization integration

🔎 Similar Papers

No similar papers found.