Few-Shot Semantic Segmentation Meets SAM3

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of few-shot semantic segmentation—namely, its heavy reliance on extensive episodic training and sensitivity to distribution shifts—by proposing a training-free solution. The method leverages the frozen Segment Anything Model 3 (SAM3) and activates its Promptable Concept Segmentation (PCS) capability through spatially concatenating support and query images onto a shared canvas, enabling cross-image reasoning without any fine-tuning. This study is the first to employ SAM3 as a training-agnostic few-shot segmenter and reveals that negative prompts can degrade target representations and induce prediction collapse in few-shot settings, challenging prevailing assumptions about prompting mechanisms. The approach achieves state-of-the-art performance on PASCAL-5^i and COCO-20^i, demonstrating that simple spatial composition alone can unlock strong generalization capabilities.
📝 Abstract
Few-Shot Semantic Segmentation (FSS) focuses on segmenting novel object categories from only a handful of annotated examples. Most existing approaches rely on extensive episodic training to learn transferable representations, which is both computationally demanding and sensitive to distribution shifts. In this work, we revisit FSS from the perspective of modern vision foundation models and explore the potential of Segment Anything Model 3 (SAM3) as a training-free solution. By repurposing its Promptable Concept Segmentation (PCS) capability, we adopt a simple spatial concatenation strategy that places support and query images into a shared canvas, allowing a fully frozen SAM3 to perform segmentation without any fine-tuning or architectural changes. Experiments on PASCAL-$5^i$ and COCO-$20^i$ show that this minimal design already achieves state-of-the-art performance, outperforming many heavily engineered methods. Beyond empirical gains, we uncover that negative prompts can be counterproductive in few-shot settings, where they often weaken target representations and lead to prediction collapse despite their intended role in suppressing distractors. These findings suggest that strong cross-image reasoning can emerge from simple spatial formulations, while also highlighting limitations in how current foundation models handle conflicting prompt signals. Code at: https://github.com/WongKinYiu/FSS-SAM3
Problem

Research questions and friction points this paper is trying to address.

Few-Shot Semantic Segmentation
Segment Anything Model
Promptable Concept Segmentation
Distribution Shift
Foundation Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Few-Shot Semantic Segmentation
Segment Anything Model
Promptable Concept Segmentation
Training-Free
Negative Prompt