No time to train! Training-Free Reference-Based Instance Segmentation

πŸ“… 2025-07-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing prompt-based segmentation models (e.g., SAM) still rely on manual visual prompts or domain-specific rules. To address this, we propose a training-free few-shot instance segmentation framework that enables zero-shot cross-image instance segmentation using only a small set of reference images. Our method constructs a reference image memory bank and leverages a frozen foundation model to extract discriminative features. It then establishes pixel-level correspondences between reference and query images via semantic-aware matching and multi-stage feature aggregation, ultimately generating instance-level masks. Crucially, the approach eliminates both prompt engineering and fine-tuning, drastically reducing human intervention. Evaluated on COCO FSOD, PASCAL VOC Few-Shot, and cross-domain FSOD benchmarks, it achieves 36.8% nAP and 71.2% nAP50β€”surpassing all existing training-free methods. This work introduces an efficient, general-purpose paradigm for few-shot segmentation.

Technology Category

Application Category

πŸ“ Abstract
The performance of image segmentation models has historically been constrained by the high cost of collecting large-scale annotated data. The Segment Anything Model (SAM) alleviates this original problem through a promptable, semantics-agnostic, segmentation paradigm and yet still requires manual visual-prompts or complex domain-dependent prompt-generation rules to process a new image. Towards reducing this new burden, our work investigates the task of object segmentation when provided with, alternatively, only a small set of reference images. Our key insight is to leverage strong semantic priors, as learned by foundation models, to identify corresponding regions between a reference and a target image. We find that correspondences enable automatic generation of instance-level segmentation masks for downstream tasks and instantiate our ideas via a multi-stage, training-free method incorporating (1) memory bank construction; (2) representation aggregation and (3) semantic-aware feature matching. Our experiments show significant improvements on segmentation metrics, leading to state-of-the-art performance on COCO FSOD (36.8% nAP), PASCAL VOC Few-Shot (71.2% nAP50) and outperforming existing training-free approaches on the Cross-Domain FSOD benchmark (22.4% nAP).
Problem

Research questions and friction points this paper is trying to address.

Reducing reliance on manual prompts for image segmentation
Enabling segmentation using only reference images
Leveraging semantic priors for automatic mask generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging semantic priors from foundation models
Training-free multi-stage method with memory bank
Semantic-aware feature matching for segmentation
πŸ”Ž Similar Papers
No similar papers found.