🤖 AI Summary
Evaluating transfer learning performance across domains, tasks, and architectures remains challenging due to the poor predictive reliability of existing transferability metrics. Method: We propose a lightweight transferability evaluator based on k-nearest neighbor (k-NN) classification in pretrained feature space—requiring no gradient computation, fine-tuning, or auxiliary model training, and relying solely on pretrained features and a small number of target-domain labels. Contribution/Results: Through systematic evaluation across 42,000+ experiments, 23 baseline metrics, and 16 datasets, we empirically demonstrate that mainstream transferability measures frequently fail; in contrast, our k-NN evaluator achieves an average 27% improvement in prediction correlation over all existing methods. The approach is computationally efficient, broadly applicable, and highly generalizable—establishing a robust, universal paradigm for transfer learning assessment.
📝 Abstract
How well can one expect transfer learning to work in a new setting where the domain is shifted, the task is different, and the architecture changes? Many transfer learning metrics have been proposed to answer this question. But how accurate are their predictions in a realistic new setting? We conducted an extensive evaluation involving over 42,000 experiments comparing 23 transferability metrics across 16 different datasets to assess their ability to predict transfer performance. Our findings reveal that none of the existing metrics perform well across the board. However, we find that a simple k-nearest neighbor evaluation -- as is commonly used to evaluate feature quality for self-supervision -- not only surpasses existing metrics, but also offers better computational efficiency and ease of implementation.