ReferSplat: Referring Segmentation in 3D Gaussian Splatting

📅 2025-08-11

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the challenging task of natural language-guided 3D Gaussian scene object segmentation, where occlusion, novel viewpoints, and complex spatial relations (e.g., “the red chair on the left”) pose significant multimodal understanding difficulties. To tackle these challenges, we propose ReferSplat—the first framework to explicitly model spatially aware associations between 3D Gaussian primitives and linguistic descriptions. We further introduce Ref-LERF (R3DGS), the first dedicated benchmark dataset for this task. Our method integrates 3D Gaussian splatting representations, cross-modal semantic alignment, and explicit spatial relation reasoning. Evaluated on the newly defined R3DGS task and established 3D open-vocabulary segmentation benchmarks, ReferSplat achieves state-of-the-art performance. It establishes a novel paradigm for language-driven 3D scene understanding that is interpretable, robust to viewpoint and occlusion variations, and generalizable across unseen categories and spatial configurations.

Technology Category

Application Category

📝 Abstract

We introduce Referring 3D Gaussian Splatting Segmentation (R3DGS), a new task that aims to segment target objects in a 3D Gaussian scene based on natural language descriptions, which often contain spatial relationships or object attributes. This task requires the model to identify newly described objects that may be occluded or not directly visible in a novel view, posing a significant challenge for 3D multi-modal understanding. Developing this capability is crucial for advancing embodied AI. To support research in this area, we construct the first R3DGS dataset, Ref-LERF. Our analysis reveals that 3D multi-modal understanding and spatial relationship modeling are key challenges for R3DGS. To address these challenges, we propose ReferSplat, a framework that explicitly models 3D Gaussian points with natural language expressions in a spatially aware paradigm. ReferSplat achieves state-of-the-art performance on both the newly proposed R3DGS task and 3D open-vocabulary segmentation benchmarks. Dataset and code are available at https://github.com/heshuting555/ReferSplat.

Problem

Research questions and friction points this paper is trying to address.

Segmenting 3D objects using natural language descriptions

Handling occluded objects in novel 3D views

Modeling spatial relationships in 3D multi-modal understanding

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting for segmentation

Natural language spatial relationship modeling

State-of-the-art 3D open-vocabulary segmentation

🔎 Similar Papers

SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition