Sample Complexity of Transfer Learning: An Optimal Transport Approach

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This study investigates the theoretical mechanisms by which transfer learning enhances sample efficiency in few-shot, high-dimensional settings. Addressing scenarios where target-domain data are scarce and models are complex, the work introduces optimal transport theory—combined with nonparametric statistics and high-dimensional probability tools—to analyze the sample complexity of transfer learning. The authors rigorously establish that when the target function exhibits limited smoothness, transfer learning achieves a convergence rate of $O(m^{-(\alpha+1)/d})$, substantially outperforming the $O(m^{-p/d})$ rate of direct learning without transfer. These theoretical findings are empirically validated through image classification experiments, demonstrating significant performance gains of transfer learning under small-sample conditions.

📝 Abstract

Transfer learning is an essential technique for many machine learning/AI models of complex structures such as large language models and generative AI. The essence of transfer learning is to leverage knowledge from resolved source tasks for a new target task, especially when the sample size $m$ of the training data for the latter is low. In this work, we rigorously analyze the potential benefit of transfer learning in terms of sample efficiency. Specifically, taking an optimal transport viewpoint of transfer learning, we find that when the data dimension $d$ is higher than $3$, the sample complexity for transfer learning is $O(m^{-(α+1)/d})$, with $α$ indicating the smoothness of the data distribution, as opposed to the $O(m^{-p/d})$ sample complexity for direct learning with $p$ indicating the smoothness of the optimal target model. Our finding theoretically supports a better sample efficiency for transfer learning, when the target task is optimizing over a family of not-so-smooth models (i.e., highly complex networks with the possible use of non-smooth activation functions). Using image classification as an example, we numerically demonstrate the sample efficiency for transfer learning, that is, in the data hungry regime, the model performance can be significantly improved by transfer learning.

Problem

Research questions and friction points this paper is trying to address.

transfer learning

sample complexity

sample efficiency

optimal transport

data scarcity

Innovation

Methods, ideas, or system contributions that make the work stand out.

transfer learning

sample complexity

optimal transport