🤖 AI Summary
This work investigates how unsupervised pretraining and transfer learning affect the sample complexity of high-dimensional supervised learning under limited labeled data, focusing on online stochastic gradient descent (SGD) for single-layer neural networks. Leveraging high-dimensional statistical learning theory, single-index model analysis, and formal modeling of concept drift, we establish—under general assumptions—the first rigorous proof that pretraining reduces the required labeled sample size by a polynomial factor, demonstrating universal acceleration. Furthermore, in specific concept drift settings, pretraining achieves exponential improvement in sample efficiency—a counterintuitive result underscoring its critical role in dynamic environments. Our analysis provides the first tight theoretical characterization of the statistical benefits of representation learning, bridging foundational theory with empirical observations in modern deep learning.
📝 Abstract
Unsupervised pre-training and transfer learning are commonly used techniques to initialize training algorithms for neural networks, particularly in settings with limited labeled data. In this paper, we study the effects of unsupervised pre-training and transfer learning on the sample complexity of high-dimensional supervised learning. Specifically, we consider the problem of training a single-layer neural network via online stochastic gradient descent. We establish that pre-training and transfer learning (under concept shift) reduce sample complexity by polynomial factors (in the dimension) under very general assumptions. We also uncover some surprising settings where pre-training grants exponential improvement over random initialization in terms of sample complexity.