Provable Benefits of Unsupervised Pre-training and Transfer Learning via Single-Index Models

📅 2025-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates how unsupervised pretraining and transfer learning affect the sample complexity of high-dimensional supervised learning under limited labeled data, focusing on online stochastic gradient descent (SGD) for single-layer neural networks. Leveraging high-dimensional statistical learning theory, single-index model analysis, and formal modeling of concept drift, we establish—under general assumptions—the first rigorous proof that pretraining reduces the required labeled sample size by a polynomial factor, demonstrating universal acceleration. Furthermore, in specific concept drift settings, pretraining achieves exponential improvement in sample efficiency—a counterintuitive result underscoring its critical role in dynamic environments. Our analysis provides the first tight theoretical characterization of the statistical benefits of representation learning, bridging foundational theory with empirical observations in modern deep learning.

Technology Category

Application Category

📝 Abstract
Unsupervised pre-training and transfer learning are commonly used techniques to initialize training algorithms for neural networks, particularly in settings with limited labeled data. In this paper, we study the effects of unsupervised pre-training and transfer learning on the sample complexity of high-dimensional supervised learning. Specifically, we consider the problem of training a single-layer neural network via online stochastic gradient descent. We establish that pre-training and transfer learning (under concept shift) reduce sample complexity by polynomial factors (in the dimension) under very general assumptions. We also uncover some surprising settings where pre-training grants exponential improvement over random initialization in terms of sample complexity.
Problem

Research questions and friction points this paper is trying to address.

Unsupervised pre-training reduces sample complexity
Transfer learning enhances neural network training efficiency
Exponential improvement over random initialization in some settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised pre-training reduces complexity
Transfer learning enhances sample efficiency
Exponential improvement with pre-training demonstrated
🔎 Similar Papers
No similar papers found.
T
Taj Jones-McCormick
Department of Statistics and Actuarial Science, University of Waterloo, Canada
Aukosh Jagannath
Aukosh Jagannath
Canada Research Chair in Mathematical Foundations of Data Science, University of Waterloo
ProbabilityMathematics of Data ScienceStatistical Physics
S
Subhabrata Sen
Department of Statistics, Harvard University, United States of America