Benchmarking Transferability: A Framework for Fair and Robust Evaluation

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fundamental lack of fairness and robustness in evaluating model transferability across domains. We propose the first systematic, standardized benchmarking framework for assessing cross-domain transfer capability. Our method introduces a unified multi-source-domain–target-domain evaluation protocol, encompassing diverse transfer tasks and perturbation-robustness analysis, and adopts head-training (i.e., linear-probe fine-tuning) as the consistent evaluation paradigm. Empirical analysis reveals significant performance discrepancies among existing transferability metrics under varying experimental settings, undermining their reliability. Our framework substantially improves assessment fidelity, yielding an average 3.5% gain in transfer performance under standard head-training configurations. To foster reproducibility and rigorous comparison, we fully open-source all code, datasets, and evaluation pipelines—establishing a new, standardized paradigm for transferability measurement.

Technology Category

Application Category

📝 Abstract
Transferability scores aim to quantify how well a model trained on one domain generalizes to a target domain. Despite numerous methods proposed for measuring transferability, their reliability and practical usefulness remain inconclusive, often due to differing experimental setups, datasets, and assumptions. In this paper, we introduce a comprehensive benchmarking framework designed to systematically evaluate transferability scores across diverse settings. Through extensive experiments, we observe variations in how different metrics perform under various scenarios, suggesting that current evaluation practices may not fully capture each method's strengths and limitations. Our findings underscore the value of standardized assessment protocols, paving the way for more reliable transferability measures and better-informed model selection in cross-domain applications. Additionally, we achieved a 3.5% improvement using our proposed metric for the head-training fine-tuning experimental setup. Our code is available in this repository: https://github.com/alizkzm/pert_robust_platform.
Problem

Research questions and friction points this paper is trying to address.

Evaluating reliability of transferability scores across domains
Addressing inconsistencies in transferability measurement methods
Proposing standardized framework for robust transferability assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive benchmarking framework for transferability scores
Standardized assessment protocols for reliable measures
Improved metric with 3.5% performance gain
🔎 Similar Papers
No similar papers found.
A
Alireza Kazemi
The University of Queensland, Brisbane, Australia
H
Helia Rezvani
The University of Queensland, Brisbane, Australia
Mahsa Baktashmotlagh
Mahsa Baktashmotlagh
University of Queensland
Machine LearningComputer Vision