Foreground-Covering Prototype Generation and Matching for SAM-Aided Few-Shot Segmentation

📅 2025-01-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses few-shot segmentation (FSS), proposing a novel foreground-coverage prototype matching framework. To overcome limitations of conventional pixel-to-prototype matching, we introduce the first foreground-coverage prototype generation mechanism: it fuses multi-scale features from SAM’s image encoder and ResNet to construct class-consistent support set prototypes; employs iterative cross-attention to refine pseudo-masks, which subsequently guide feature aggregation; and dynamically generates point/box prompts via prototype matching to drive SAM’s decoder for precise mask prediction. Our method achieves state-of-the-art performance on major FSS benchmarks—including PASCAL-5i and COCO-20i—demonstrating substantial improvements in segmentation accuracy under extreme low-data regimes and enhanced cross-domain generalization capability.

Technology Category

Application Category

📝 Abstract
We propose Foreground-Covering Prototype Generation and Matching to resolve Few-Shot Segmentation (FSS), which aims to segment target regions in unlabeled query images based on labeled support images. Unlike previous research, which typically estimates target regions in the query using support prototypes and query pixels, we utilize the relationship between support and query prototypes. To achieve this, we utilize two complementary features: SAM Image Encoder features for pixel aggregation and ResNet features for class consistency. Specifically, we construct support and query prototypes with SAM features and distinguish query prototypes of target regions based on ResNet features. For the query prototype construction, we begin by roughly guiding foreground regions within SAM features using the conventional pseudo-mask, then employ iterative cross-attention to aggregate foreground features into learnable tokens. Here, we discover that the cross-attention weights can effectively alternate the conventional pseudo-mask. Therefore, we use the attention-based pseudo-mask to guide ResNet features to focus on the foreground, then infuse the guided ResNet feature into the learnable tokens to generate class-consistent query prototypes. The generation of the support prototype is conducted symmetrically to that of the query one, with the pseudo-mask replaced by the ground-truth mask. Finally, we compare these query prototypes with support ones to generate prompts, which subsequently produce object masks through the SAM Mask Decoder. Our state-of-the-art performances on various datasets validate the effectiveness of the proposed method for FSS. Our official code is available at https://github.com/SuhoPark0706/FCP
Problem

Research questions and friction points this paper is trying to address.

Few-shot Segmentation
Limited Labeled Data
Target Region Identification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Foreground Prototype Generation
Few-Shot Segmentation
Feature Point Localization
🔎 Similar Papers
No similar papers found.
S
Suho Park
Sungkyunkwan University
S
Subeen Lee
Sungkyunkwan University
Hyun Seok Seong
Hyun Seok Seong
Sungkyunkwan University
Machine LearningComputer Vision
Jaejoon Yoo
Jaejoon Yoo
Sungkyunkwan University
J
Jae-pil Heo
Sungkyunkwan University