SANSA: Unleashing the Hidden Semantics in SAM2 for Few-Shot Segmentation

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

SAM2, optimized for visual tracking, suffers from entangled semantic features that hinder generalization to unseen classes in few-shot segmentation. Method: We propose SANSA, the first framework to uncover and exploit rich high-level semantic structures latent in SAM2’s vision features—without modifying its weights—via a lightweight semantic alignment module and feature decoupling strategy, enabling class-aware prompt-and-propagate segmentation with diverse input prompts (points, boxes, scribbles). Results: SANSA achieves state-of-the-art performance on generalization-oriented few-shot segmentation benchmarks; significantly outperforms general-purpose segmentation models under in-context learning; and enables efficient inference with negligible parameter overhead. Its core contribution is the zero-shot, weight-free decoupling and explicit utilization of SAM2’s pre-trained semantic knowledge, effectively bridging the gap between tracking-oriented foundation models and semantic segmentation tasks.

Technology Category

Application Category

📝 Abstract

Few-shot segmentation aims to segment unseen object categories from just a handful of annotated examples. This requires mechanisms that can both identify semantically related objects across images and accurately produce segmentation masks. We note that Segment Anything 2 (SAM2), with its prompt-and-propagate mechanism, offers both strong segmentation capabilities and a built-in feature matching process. However, we show that its representations are entangled with task-specific cues optimized for object tracking, which impairs its use for tasks requiring higher level semantic understanding. Our key insight is that, despite its class-agnostic pretraining, SAM2 already encodes rich semantic structure in its features. We propose SANSA (Semantically AligNed Segment Anything 2), a framework that makes this latent structure explicit, and repurposes SAM2 for few-shot segmentation through minimal task-specific modifications. SANSA achieves state-of-the-art performance on few-shot segmentation benchmarks specifically designed to assess generalization, outperforms generalist methods in the popular in-context setting, supports various prompts flexible interaction via points, boxes, or scribbles, and remains significantly faster and more compact than prior approaches. Code is available at https://github.com/ClaudiaCuttano/SANSA.

Problem

Research questions and friction points this paper is trying to address.

Enhancing few-shot segmentation with semantic alignment

Overcoming SAM2's task-specific feature entanglement

Enabling flexible prompts for diverse interaction modes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unveils latent semantic structure in SAM2

Repurposes SAM2 for few-shot segmentation

Supports flexible prompts like points and boxes

🔎 Similar Papers

Retrieval-augmented Few-shot Medical Image Segmentation with Foundation Models