Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space

📅 2024-08-14
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses open-vocabulary semantic segmentation in 3D scenes, achieving the first end-to-end open-vocabulary segmentation over the complete 3D volumetric space of both NeRF and 3D Gaussian Splatting (3DGS), overcoming the limitation of prior methods that produce only 2D masks. The proposed method introduces point-level language embedding field supervision, a cross-representation semantic transfer mechanism (NeRF → 3DGS), and the first geometry-semantic joint 3D query evaluation protocol. It integrates 3D point cloud language embedding learning, CLIP feature distillation, and voxel-level semantic querying. Evaluated on ScanNet and Objaverse, it achieves state-of-the-art 3D semantic segmentation accuracy while enabling real-time rendering (>60 FPS). This work establishes a novel paradigm for open-vocabulary 3D understanding.

Technology Category

Application Category

📝 Abstract
Understanding the 3D semantics of a scene is a fundamental problem for various scenarios such as embodied agents. While NeRFs and 3DGS excel at novel-view synthesis, previous methods for understanding their semantics have been limited to incomplete 3D understanding: their segmentation results are rendered as 2D masks that do not represent the entire 3D space. To address this limitation, we redefine the problem to segment the 3D volume and propose the following methods for better 3D understanding. We directly supervise the 3D points to train the language embedding field, unlike previous methods that anchor supervision at 2D pixels. We transfer the learned language field to 3DGS, achieving the first real-time rendering speed without sacrificing training time or accuracy. Lastly, we introduce a 3D querying and evaluation protocol for assessing the reconstructed geometry and semantics together. Code, checkpoints, and annotations are available at the project page.
Problem

Research questions and friction points this paper is trying to address.

Improve 3D semantic segmentation in radiance fields
Enable real-time rendering with 3D Gaussian splatting
Develop new 3D querying and evaluation methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Direct 3D point supervision
Real-time 3DGS rendering
3D querying protocol
🔎 Similar Papers
No similar papers found.
H
Hyunjee Lee
Yonsei University
Y
Youngsik Yun
Yonsei University
Jeongmin Bae
Jeongmin Bae
a graduate student at Yonsei University
Deep LearningGenerative models3D
S
Seoha Kim
Yonsei University
Youngjung Uh
Youngjung Uh
Yonsei University
Generative models