ReferSplat: Referring Segmentation in 3D Gaussian Splatting

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenging task of natural language-guided 3D Gaussian scene object segmentation, where occlusion, novel viewpoints, and complex spatial relations (e.g., “the red chair on the left”) pose significant multimodal understanding difficulties. To tackle these challenges, we propose ReferSplat—the first framework to explicitly model spatially aware associations between 3D Gaussian primitives and linguistic descriptions. We further introduce Ref-LERF (R3DGS), the first dedicated benchmark dataset for this task. Our method integrates 3D Gaussian splatting representations, cross-modal semantic alignment, and explicit spatial relation reasoning. Evaluated on the newly defined R3DGS task and established 3D open-vocabulary segmentation benchmarks, ReferSplat achieves state-of-the-art performance. It establishes a novel paradigm for language-driven 3D scene understanding that is interpretable, robust to viewpoint and occlusion variations, and generalizable across unseen categories and spatial configurations.

Technology Category

Application Category

📝 Abstract
We introduce Referring 3D Gaussian Splatting Segmentation (R3DGS), a new task that aims to segment target objects in a 3D Gaussian scene based on natural language descriptions, which often contain spatial relationships or object attributes. This task requires the model to identify newly described objects that may be occluded or not directly visible in a novel view, posing a significant challenge for 3D multi-modal understanding. Developing this capability is crucial for advancing embodied AI. To support research in this area, we construct the first R3DGS dataset, Ref-LERF. Our analysis reveals that 3D multi-modal understanding and spatial relationship modeling are key challenges for R3DGS. To address these challenges, we propose ReferSplat, a framework that explicitly models 3D Gaussian points with natural language expressions in a spatially aware paradigm. ReferSplat achieves state-of-the-art performance on both the newly proposed R3DGS task and 3D open-vocabulary segmentation benchmarks. Dataset and code are available at https://github.com/heshuting555/ReferSplat.
Problem

Research questions and friction points this paper is trying to address.

Segmenting 3D objects using natural language descriptions
Handling occluded objects in novel 3D views
Modeling spatial relationships in 3D multi-modal understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D Gaussian Splatting for segmentation
Natural language spatial relationship modeling
State-of-the-art 3D open-vocabulary segmentation
🔎 Similar Papers
No similar papers found.
Shuting He
Shuting He
Assistant Professor, Shanghai University of Finance and Economics
Computer Vision
G
Guangquan Jie
Institute of Big Data, College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
C
Changshuo Wang
Nanyang Technological University, Singapore
Y
Yun Zhou
Institute of Big Data, College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
Shuming Hu
Shuming Hu
Research Engineer, Meta
machine learningphysics
G
Guanbin Li
Sun Yat-sen University, Guangzhou, China
Henghui Ding
Henghui Ding
Fudan University
Computer VisionMachine LearningSegmentationAIGC