FS-SAM2: Adapting Segment Anything Model 2 for Few-Shot Semantic Segmentation via Low-Rank Adaptation

📅 2025-09-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses few-shot semantic segmentation (FSS) by proposing the first lightweight fine-tuning framework that transfers SAM2’s video segmentation capability to this setting. Methodologically, it introduces SAM2’s temporal modeling module—previously unexplored in FSS—to enhance cross-sample feature alignment, and employs Low-Rank Adaptation (LoRA) to fine-tune only 0.5% of the backbone parameters, enabling flexible K-shot adaptation. Key contributions include: (1) leveraging the implicit category generalization ability embedded in SAM2’s temporal module—replacing conventional prototype matching—and (2) improving robustness to unseen categories via spatio-temporal joint feature distillation. The method achieves state-of-the-art performance on PASCAL-5ⁱ, COCO-20ⁱ, and FSS-1000, with average mIoU gains of 2.3–4.7 percentage points. It retains SAM2’s native inference efficiency and introduces fewer than 1M additional parameters.

Technology Category

Application Category

📝 Abstract
Few-shot semantic segmentation has recently attracted great attention. The goal is to develop a model capable of segmenting unseen classes using only a few annotated samples. Most existing approaches adapt a pre-trained model by training from scratch an additional module. Achieving optimal performance with these approaches requires extensive training on large-scale datasets. The Segment Anything Model 2 (SAM2) is a foundational model for zero-shot image and video segmentation with a modular design. In this paper, we propose a Few-Shot segmentation method based on SAM2 (FS-SAM2), where SAM2's video capabilities are directly repurposed for the few-shot task. Moreover, we apply a Low-Rank Adaptation (LoRA) to the original modules in order to handle the diverse images typically found in standard datasets, unlike the temporally connected frames used in SAM2's pre-training. With this approach, only a small number of parameters is meta-trained, which effectively adapts SAM2 while benefiting from its impressive segmentation performance. Our method supports any K-shot configuration. We evaluate FS-SAM2 on the PASCAL-5$^i$, COCO-20$^i$ and FSS-1000 datasets, achieving remarkable results and demonstrating excellent computational efficiency during inference. Code is available at https://github.com/fornib/FS-SAM2
Problem

Research questions and friction points this paper is trying to address.

Adapting SAM2 for few-shot semantic segmentation
Using Low-Rank Adaptation for diverse image handling
Achieving efficient segmentation with minimal parameter training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adapts SAM2 for few-shot segmentation
Uses Low-Rank Adaptation (LoRA) technique
Meta-trains only small parameter subset
🔎 Similar Papers
No similar papers found.