One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares

📅 2022-07-28
🏛️ IEEE Conference on Decision and Control
📈 Citations: 22
Influential: 2
📄 PDF
🤖 AI Summary
To address the challenges of memory-constrained, privacy-sensitive online learning over streaming data—where historical samples are inaccessible and revisitation is infeasible—this paper proposes Orthogonal Recursive Fitting (ORFit). ORFit unifies orthogonal gradient descent with recursive least squares for single-pass parameter updates, simultaneously optimizing fidelity to new samples and stability of prior predictions. Theoretically, we prove that under overparameterized linear models, ORFit’s converged solution is equivalent to that of multi-epoch stochastic gradient descent (SGD). Furthermore, ORFit integrates incremental principal component analysis (IPCA) to enable nonlinear extensions. Extensive experiments on multiple streaming benchmarks demonstrate that ORFit consistently outperforms state-of-the-art online learning baselines, achieving superior efficiency, prediction stability, and scalability.
📝 Abstract
While deep neural networks are capable of achieving state-of-the-art performance in various domains, their training typically requires iterating for many passes over the dataset. However, due to computational and memory constraints and potential privacy concerns, storing and accessing all the data is impractical in many real-world scenarios where the data arrives in a stream. In this paper, we investigate the problem of one-pass learning, in which a model is trained on sequentially arriving data without retraining on previous datapoints. Motivated by the increasing use of overparameterized models, we develop Orthogonal Recursive Fitting (ORFit), an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints. By doing so, we bridge two seemingly distinct algorithms in adaptive filtering and machine learning, namely the recursive least-squares (RLS) algorithm and orthogonal gradient descent (OGD). Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA). Further, we show that, for overparameterized linear models, the parameter vector obtained by our algorithm is what stochastic gradient descent (SGD) would converge to in the standard multi-pass setting. Finally, we generalize the results to the nonlinear setting for highly overparameterized models, relevant for deep learning. Our experiments show the effectiveness of the proposed method compared to the baselines.
Problem

Research questions and friction points this paper is trying to address.

Enables single-pass training on streaming data without revisiting past examples
Maintains prediction stability on previous data while fitting new datapoints
Achieves computational efficiency comparable to multi-pass SGD convergence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Orthogonal gradient descent for sequential data fitting
Linear computational efficiency via recursive least-squares
Incremental PCA minimizes worst-case prediction forgetting