Few-shot Classification as Multi-instance Verification: Effective Backbone-agnostic Transfer across Domains

📅 2025-06-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses cross-domain few-shot learning under frozen backbone networks (CNNs or Vision Transformers), proposing a general, fine-tuning-free solution. Methodologically, it reformulates few-shot classification as a multi-instance verification (MIV) task—the first such formulation—and introduces a lightweight, backbone-agnostic MIV-head module. This module operates solely during meta-testing, performing pairwise similarity verification between target-domain support and query instances without updating any backbone parameters. Evaluated on the extended Meta-dataset benchmark, our approach matches the performance of leading adaptation methods while substantially outperforming conventional linear classification heads; notably, its adaptation overhead is reduced by an order of magnitude. The work establishes an efficient, plug-and-play paradigm for few-shot transfer learning with black-box feature extractors, enabling rapid domain adaptation without backbone modification.

Technology Category

Application Category

📝 Abstract
We investigate cross-domain few-shot learning under the constraint that fine-tuning of backbones (i.e., feature extractors) is impossible or infeasible -- a scenario that is increasingly common in practical use cases. Handling the low-quality and static embeddings produced by frozen, "black-box" backbones leads to a problem representation of few-shot classification as a series of multiple instance verification (MIV) tasks. Inspired by this representation, we introduce a novel approach to few-shot domain adaptation, named the "MIV-head", akin to a classification head that is agnostic to any pretrained backbone and computationally efficient. The core components designed for the MIV-head, when trained on few-shot data from a target domain, collectively yield strong performance on test data from that domain. Importantly, it does so without fine-tuning the backbone, and within the "meta-testing" phase. Experimenting under various settings and on an extension of the Meta-dataset benchmark for cross-domain few-shot image classification, using representative off-the-shelf convolutional neural network and vision transformer backbones pretrained on ImageNet1K, we show that the MIV-head achieves highly competitive accuracy when compared to state-of-the-art "adapter" (or partially fine-tuning) methods applied to the same backbones, while incurring substantially lower adaptation cost. We also find well-known "classification head" approaches lag far behind in terms of accuracy. Ablation study empirically justifies the core components of our approach. We share our code at https://github.com/xxweka/MIV-head.
Problem

Research questions and friction points this paper is trying to address.

Cross-domain few-shot learning without fine-tuning backbones
Handling low-quality static embeddings via multi-instance verification
Achieving competitive accuracy with lower adaptation cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses multi-instance verification for few-shot learning
Introduces backbone-agnostic MIV-head for domain adaptation
Achieves high accuracy without fine-tuning backbones
🔎 Similar Papers