🤖 AI Summary
This work addresses the inefficiency of standard stochastic gradient descent (SGD) in high-dimensional streaming settings by proposing Factor-Augmented SGD (FSGD), which integrates latent factor representations into the optimization process to enable end-to-end scalable learning. FSGD is the first method to incorporate latent factor estimation error directly into the theoretical analysis framework of SGD, thereby circumventing the need for full-data storage inherent in conventional two-stage dimensionality reduction approaches. By leveraging streaming processing, decaying step sizes, and mini-batch updates, the authors establish moment convergence guarantees for FSGD under ℓ^s norms. This provides a theoretically grounded and scalable optimization framework for high-dimensional machine learning tasks.
📝 Abstract
Stochastic gradient descent (SGD) is a fundamental optimization algorithm widely used in modern machine learning. In this paper, we propose Factor-Augmented SGD (FSGD), a new optimization method that leverages latent factor representations in high-dimensional learning tasks. Unlike standard two-stage dimension reduction approaches that rely on offline representation learning and full data storage, a key novelty of FSGD is that it operates purely on streaming data, making it scalable to large-scale and high-dimensional problems. Furthermore, we establish the first theoretical framework that explicitly incorporates latent factor estimation error into the analysis of SGD, and provide moment convergence in $\ell^s$ norm under decaying step sizes and mini-batch updates. Our results provide a new foundation for employing SGD reliably and scalably in high-dimensional machine learning systems.