🤖 AI Summary
This work addresses the challenging task of natural language-guided 3D Gaussian scene object segmentation, where occlusion, novel viewpoints, and complex spatial relations (e.g., “the red chair on the left”) pose significant multimodal understanding difficulties. To tackle these challenges, we propose ReferSplat—the first framework to explicitly model spatially aware associations between 3D Gaussian primitives and linguistic descriptions. We further introduce Ref-LERF (R3DGS), the first dedicated benchmark dataset for this task. Our method integrates 3D Gaussian splatting representations, cross-modal semantic alignment, and explicit spatial relation reasoning. Evaluated on the newly defined R3DGS task and established 3D open-vocabulary segmentation benchmarks, ReferSplat achieves state-of-the-art performance. It establishes a novel paradigm for language-driven 3D scene understanding that is interpretable, robust to viewpoint and occlusion variations, and generalizable across unseen categories and spatial configurations.
📝 Abstract
We introduce Referring 3D Gaussian Splatting Segmentation (R3DGS), a new task that aims to segment target objects in a 3D Gaussian scene based on natural language descriptions, which often contain spatial relationships or object attributes. This task requires the model to identify newly described objects that may be occluded or not directly visible in a novel view, posing a significant challenge for 3D multi-modal understanding. Developing this capability is crucial for advancing embodied AI. To support research in this area, we construct the first R3DGS dataset, Ref-LERF. Our analysis reveals that 3D multi-modal understanding and spatial relationship modeling are key challenges for R3DGS. To address these challenges, we propose ReferSplat, a framework that explicitly models 3D Gaussian points with natural language expressions in a spatially aware paradigm. ReferSplat achieves state-of-the-art performance on both the newly proposed R3DGS task and 3D open-vocabulary segmentation benchmarks. Dataset and code are available at https://github.com/heshuting555/ReferSplat.