🤖 AI Summary
To address image ambiguity in few-shot image classification caused by multi-object coexistence or complex backgrounds, this paper proposes a localization-aware preprocessing method that integrates the Segment Anything Model (SAM) with object-centric cropping—requiring no fine-grained pixel-level annotations—to automatically extract foreground regions and emphasize discriminative local features. The method significantly reduces reliance on manual annotation while enhancing feature discriminability. Evaluated on standard few-shot benchmarks—including Mini-ImageNet, Tiered-ImageNet, and CUB—it achieves state-of-the-art performance, yielding average accuracy improvements of 2.3–4.7 percentage points. Comprehensive experiments demonstrate that the proposed approach exhibits strong generalization capability and computational efficiency, offering a novel and robust paradigm for representation learning in few-shot settings.
📝 Abstract
In the domain of Few-Shot Image Classification, operating with as little as one example per class, the presence of image ambiguities stemming from multiple objects or complex backgrounds can significantly deteriorate performance. Our research demonstrates that incorporating additional information about the local positioning of an object within its image markedly enhances classification across established benchmarks. More importantly, we show that a significant fraction of the improvement can be achieved through the use of the Segment Anything Model, requiring only a pixel of the object of interest to be pointed out, or by employing fully unsupervised foreground object extraction methods.