Rethinking Open-Vocabulary Segmentation of Radiance Fields in 3D Space

📅 2024-08-14

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses open-vocabulary semantic segmentation in 3D scenes, achieving the first end-to-end open-vocabulary segmentation over the complete 3D volumetric space of both NeRF and 3D Gaussian Splatting (3DGS), overcoming the limitation of prior methods that produce only 2D masks. The proposed method introduces point-level language embedding field supervision, a cross-representation semantic transfer mechanism (NeRF → 3DGS), and the first geometry-semantic joint 3D query evaluation protocol. It integrates 3D point cloud language embedding learning, CLIP feature distillation, and voxel-level semantic querying. Evaluated on ScanNet and Objaverse, it achieves state-of-the-art 3D semantic segmentation accuracy while enabling real-time rendering (>60 FPS). This work establishes a novel paradigm for open-vocabulary 3D understanding.

Technology Category

Application Category

📝 Abstract

Understanding the 3D semantics of a scene is a fundamental problem for various scenarios such as embodied agents. While NeRFs and 3DGS excel at novel-view synthesis, previous methods for understanding their semantics have been limited to incomplete 3D understanding: their segmentation results are rendered as 2D masks that do not represent the entire 3D space. To address this limitation, we redefine the problem to segment the 3D volume and propose the following methods for better 3D understanding. We directly supervise the 3D points to train the language embedding field, unlike previous methods that anchor supervision at 2D pixels. We transfer the learned language field to 3DGS, achieving the first real-time rendering speed without sacrificing training time or accuracy. Lastly, we introduce a 3D querying and evaluation protocol for assessing the reconstructed geometry and semantics together. Code, checkpoints, and annotations are available at the project page.

Problem

Research questions and friction points this paper is trying to address.

Improve 3D semantic segmentation in radiance fields

Enable real-time rendering with 3D Gaussian splatting

Develop new 3D querying and evaluation methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Direct 3D point supervision

Real-time 3DGS rendering

3D querying protocol

🔎 Similar Papers

Search3D: Hierarchical Open-Vocabulary 3D Segmentation