🤖 AI Summary
This study addresses the **reliable assessment of knowledge transferability** in transfer learning—a longstanding challenge hindered by inconsistent evaluation criteria, poor interpretability, and ill-defined applicability scopes. We propose the first **two-dimensional classification framework**, systematically organizing over 60 mainstream transferability metrics along axes of *transferable knowledge type* (e.g., features, relations, semantics) and *measurement granularity* (sample-, task-, or domain-level), while rigorously reconstructing their mathematical foundations, underlying assumptions, and failure boundaries. Through cross-modal and cross-task empirical analysis, we characterize the efficacy gradients and root limitations of metrics across paradigms (e.g., pretraining-finetuning). Our work establishes a standardized assessment pathway and principled metric selection guidelines for transferability evaluation, advancing trustworthy AI evaluation infrastructure, and identifying key future directions—including dynamic transferability modeling and causally grounded metrics.
📝 Abstract
Transfer learning has become an essential paradigm in artificial intelligence, enabling the transfer of knowledge from a source task to improve performance on a target task. This approach, particularly through techniques such as pretraining and fine-tuning, has seen significant success in fields like computer vision and natural language processing. However, despite its widespread use, how to reliably assess the transferability of knowledge remains a challenge. Understanding the theoretical underpinnings of each transferability metric is critical for ensuring the success of transfer learning. In this survey, we provide a unified taxonomy of transferability metrics, categorizing them based on transferable knowledge types and measurement granularity. This work examines the various metrics developed to evaluate the potential of source knowledge for transfer learning and their applicability across different learning paradigms emphasizing the need for careful selection of these metrics. By offering insights into how different metrics work under varying conditions, this survey aims to guide researchers and practitioners in selecting the most appropriate metric for specific applications, contributing to more efficient, reliable, and trustworthy AI systems. Finally, we discuss some open challenges in this field and propose future research directions to further advance the application of transferability metrics in trustworthy transfer learning.