COS3D: Collaborative Open-Vocabulary 3D Segmentation

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Existing Gaussian-splatting-based open-vocabulary 3D segmentation methods suffer from two key bottlenecks: (1) single-language-field modeling yields insufficient semantic discriminability, and (2) reliance on precomputed category-agnostic segmentation propagates cumulative errors. To address these, we propose CoPS (Collaborative Prompt-Segmentation), the first framework jointly modeling instance and language fields. CoPS introduces a unified *Collaborative Field* (Co-Field) that jointly encodes geometry, appearance, and semantics. It further designs an instance-to-language feature mapping module and an adaptive language-to-instance prompt optimization module, trained via a two-stage strategy to enhance cross-modal consistency. Evaluated on ScanNetV2 and 3RScan, CoPS achieves state-of-the-art performance in open-vocabulary 3D segmentation. It enables novel image-driven 3D segmentation, hierarchical semantic parsing, and robot-centric scene understanding. The code is publicly available.

Technology Category

Application Category

📝 Abstract

Open-vocabulary 3D segmentation is a fundamental yet challenging task, requiring a mutual understanding of both segmentation and language. However, existing Gaussian-splatting-based methods rely either on a single 3D language field, leading to inferior segmentation, or on pre-computed class-agnostic segmentations, suffering from error accumulation. To address these limitations, we present COS3D, a new collaborative prompt-segmentation framework that contributes to effectively integrating complementary language and segmentation cues throughout its entire pipeline. We first introduce the new concept of collaborative field, comprising an instance field and a language field, as the cornerstone for collaboration. During training, to effectively construct the collaborative field, our key idea is to capture the intrinsic relationship between the instance field and language field, through a novel instance-to-language feature mapping and designing an efficient two-stage training strategy. During inference, to bridge distinct characteristics of the two fields, we further design an adaptive language-to-instance prompt refinement, promoting high-quality prompt-segmentation inference. Extensive experiments not only demonstrate COS3D's leading performance over existing methods on two widely-used benchmarks but also show its high potential to various applications,~ie, novel image-based 3D segmentation, hierarchical segmentation, and robotics. The code is publicly available at href{https://github.com/Runsong123/COS3D}{https://github.com/Runsong123/COS3D}.

Problem

Research questions and friction points this paper is trying to address.

Addresses limitations in open-vocabulary 3D segmentation methods

Integrates complementary language and segmentation cues effectively

Bridges instance and language fields for high-quality segmentation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Collaborative field integrates instance and language fields

Two-stage training strategy captures intrinsic feature relationships

Adaptive prompt refinement bridges language and instance characteristics

🔎 Similar Papers

Search3D: Hierarchical Open-Vocabulary 3D Segmentation

2024-09-27arXiv.orgCitations: 1

Vocabulary-Free 3D Instance Segmentation with Vision and Language Assistant

2024-08-20arXiv.orgCitations: 2

💼 Related Jobs

fetch failed

Authors to Follow