SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation

๐Ÿ“… 2024-07-23
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address SAMโ€™s lack of semantic awareness and its inability to support open-vocabulary and multi-granularity semantic segmentation, this paper proposes a dual-type composable prompting framework: Type-I prompts align textual class labels with SAMโ€™s base segmentation tokens semantically; Type-II prompts model instance consistency by unifying affinity modeling between semantic/instance queries and SAM tokens. The method requires no fine-tuning, integrating zero-shot SAM segmentation, CLIP-based textโ€“image matching, affinity graph construction, and hierarchical token merging. It supports semantic, instance, and panoptic segmentation in both open- and closed-vocabulary settings. On open-vocabulary segmentation benchmarks, it achieves state-of-the-art performance, significantly outperforming existing adaptation methods across multiple datasets. Notably, it is the first framework to enable single-model, zero-shot, multi-granularity, open-vocabulary, semantic-aware segmentation.

Technology Category

Application Category

๐Ÿ“ Abstract
The Segment Anything model (SAM) has shown a generalized ability to group image pixels into patches, but applying it to semantic-aware segmentation still faces major challenges. This paper presents SAM-CP, a simple approach that establishes two types of composable prompts beyond SAM and composes them for versatile segmentation. Specifically, given a set of classes (in texts) and a set of SAM patches, the Type-I prompt judges whether a SAM patch aligns with a text label, and the Type-II prompt judges whether two SAM patches with the same text label also belong to the same instance. To decrease the complexity in dealing with a large number of semantic classes and patches, we establish a unified framework that calculates the affinity between (semantic and instance) queries and SAM patches and merges patches with high affinity to the query. Experiments show that SAM-CP achieves semantic, instance, and panoptic segmentation in both open and closed domains. In particular, it achieves state-of-the-art performance in open-vocabulary segmentation. Our research offers a novel and generalized methodology for equipping vision foundation models like SAM with multi-grained semantic perception abilities.
Problem

Research questions and friction points this paper is trying to address.

Enhancing SAM for semantic-aware segmentation with composable prompts
Reducing complexity in handling multiple semantic classes and patches
Achieving versatile segmentation in open and closed domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Composable prompts enhance SAM segmentation
Unified framework merges high-affinity patches
Achieves open-vocabulary segmentation state-of-the-art
๐Ÿ”Ž Similar Papers
No similar papers found.
P
Pengfei Chen
University of Chinese Academy of Sciences, Huawei Inc.
L
Lingxi Xie
Huawei Inc.
X
Xinyue Huo
Huawei Inc., University of Science and Technology of China
X
Xuehui Yu
University of Chinese Academy of Sciences
X
Xiaopeng Zhang
Huawei Inc.
Y
Yingfei Sun
University of Chinese Academy of Sciences
Z
Zhenjun Han
University of Chinese Academy of Sciences
Q
Qi Tian
Huawei Inc.