Features are fate: a theory of transfer learning in high-dimensional regression

📅 2024-10-10
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fundamental challenge in transfer learning: the difficulty of accurately characterizing task similarity to reliably predict transfer performance. We propose a novel theoretical framework grounded in feature-space overlap—not distributional distance—enabling principled transfer prediction. Through analytical modeling and phase-transition analysis of deep linear networks, we rigorously establish feature representation consistency as the necessary and sufficient condition for successful transfer, and construct an analytically tractable transfer phase diagram. Our theory reveals a sharp phase transition in transfer performance governed jointly by source-task sample size and feature overlap. Empirically, under high overlap, linear transfer and fine-tuning significantly outperform training from scratch; these findings are validated numerically on nonlinear networks. Crucially, this work overturns the conventional paradigm that relies on φ-divergences or other distributional metrics for transfer assessment, providing both a rigorous theoretical foundation and practical criteria for few-shot transfer learning.

Technology Category

Application Category

📝 Abstract
With the emergence of large-scale pre-trained neural networks, methods to adapt such"foundation"models to data-limited downstream tasks have become a necessity. Fine-tuning, preference optimization, and transfer learning have all been successfully employed for these purposes when the target task closely resembles the source task, but a precise theoretical understanding of"task similarity"is still lacking. While conventional wisdom suggests that simple measures of similarity between source and target distributions, such as $phi$-divergences or integral probability metrics, can directly predict the success of transfer, we prove the surprising fact that, in general, this is not the case. We adopt, instead, a feature-centric viewpoint on transfer learning and establish a number of theoretical results that demonstrate that when the target task is well represented by the feature space of the pre-trained model, transfer learning outperforms training from scratch. We study deep linear networks as a minimal model of transfer learning in which we can analytically characterize the transferability phase diagram as a function of the target dataset size and the feature space overlap. For this model, we establish rigorously that when the feature space overlap between the source and target tasks is sufficiently strong, both linear transfer and fine-tuning improve performance, especially in the low data limit. These results build on an emerging understanding of feature learning dynamics in deep linear networks, and we demonstrate numerically that the rigorous results we derive for the linear case also apply to nonlinear networks.
Problem

Research questions and friction points this paper is trying to address.

Understanding task similarity in transfer learning
Feature-centric theory for transfer learning success
Phase diagram of transferability in deep networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature-centric transfer learning theory
Deep linear networks analysis
Feature space overlap importance