SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation

📅 2024-07-23

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

189K/year

🤖 AI Summary

To address SAM’s lack of semantic awareness and its inability to support open-vocabulary and multi-granularity semantic segmentation, this paper proposes a dual-type composable prompting framework: Type-I prompts align textual class labels with SAM’s base segmentation tokens semantically; Type-II prompts model instance consistency by unifying affinity modeling between semantic/instance queries and SAM tokens. The method requires no fine-tuning, integrating zero-shot SAM segmentation, CLIP-based text–image matching, affinity graph construction, and hierarchical token merging. It supports semantic, instance, and panoptic segmentation in both open- and closed-vocabulary settings. On open-vocabulary segmentation benchmarks, it achieves state-of-the-art performance, significantly outperforming existing adaptation methods across multiple datasets. Notably, it is the first framework to enable single-model, zero-shot, multi-granularity, open-vocabulary, semantic-aware segmentation.

Technology Category

Application Category

📝 Abstract

The Segment Anything model (SAM) has shown a generalized ability to group image pixels into patches, but applying it to semantic-aware segmentation still faces major challenges. This paper presents SAM-CP, a simple approach that establishes two types of composable prompts beyond SAM and composes them for versatile segmentation. Specifically, given a set of classes (in texts) and a set of SAM patches, the Type-I prompt judges whether a SAM patch aligns with a text label, and the Type-II prompt judges whether two SAM patches with the same text label also belong to the same instance. To decrease the complexity in dealing with a large number of semantic classes and patches, we establish a unified framework that calculates the affinity between (semantic and instance) queries and SAM patches and merges patches with high affinity to the query. Experiments show that SAM-CP achieves semantic, instance, and panoptic segmentation in both open and closed domains. In particular, it achieves state-of-the-art performance in open-vocabulary segmentation. Our research offers a novel and generalized methodology for equipping vision foundation models like SAM with multi-grained semantic perception abilities.

Problem

Research questions and friction points this paper is trying to address.

Enhancing SAM for semantic-aware segmentation with composable prompts

Reducing complexity in handling multiple semantic classes and patches

Achieving versatile segmentation in open and closed domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Composable prompts enhance SAM segmentation

Unified framework merges high-affinity patches

Achieves open-vocabulary segmentation state-of-the-art

🔎 Similar Papers

No similar papers found.