Understanding Incremental Learning with Closed-form Solution to Gradient Flow on Overparamerterized Matrix Factorization

📅 2025-08-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the incremental learning phenomenon in over-parameterized matrix factorization—specifically, why small-initialization gradient flow learns singular components of the target matrix in descending order of singular values. We derive a closed-form dynamical characterization based on Riccati-type matrix differential equations. Our analysis rigorously establishes time-scale separation as the core mechanism: components associated with larger singular values evolve as fast variables and converge rapidly, while those tied to smaller singular values act as slow variables, activating later. By tuning the initialization scale, we achieve precise control over the learning sequence and enable controllable low-rank approximation. Methodologically, the approach integrates analytical tools from matrix differential equations, gradient flow dynamics modeling, and symmetric decomposition theory. It yields the first quantitative characterization of the entire learning trajectory and provides an extensible theoretical framework for asymmetric factorizations.

Technology Category

Application Category

📝 Abstract
Many theoretical studies on neural networks attribute their excellent empirical performance to the implicit bias or regularization induced by first-order optimization algorithms when training networks under certain initialization assumptions. One example is the incremental learning phenomenon in gradient flow (GF) on an overparamerterized matrix factorization problem with small initialization: GF learns a target matrix by sequentially learning its singular values in decreasing order of magnitude over time. In this paper, we develop a quantitative understanding of this incremental learning behavior for GF on the symmetric matrix factorization problem, using its closed-form solution obtained by solving a Riccati-like matrix differential equation. We show that incremental learning emerges from some time-scale separation among dynamics corresponding to learning different components in the target matrix. By decreasing the initialization scale, these time-scale separations become more prominent, allowing one to find low-rank approximations of the target matrix. Lastly, we discuss the possible avenues for extending this analysis to asymmetric matrix factorization problems.
Problem

Research questions and friction points this paper is trying to address.

Analyzing incremental learning dynamics in overparameterized matrix factorization
Understanding time-scale separation in gradient flow closed-form solutions
Extending analysis from symmetric to asymmetric matrix factorization problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Closed-form solution for gradient flow
Time-scale separation in learning dynamics
Small initialization enables low-rank approximation
🔎 Similar Papers
No similar papers found.
Hancheng Min
Hancheng Min
Shanghai Jiao Tong University
Deep Learning TheoryDynamical Systems and ControlNetworked Systems
R
René Vidal
Department of Electrical and Systems Engineering (ESE) and the Department of Radiology in the Perelman School of Medicine at the University of Pennsylvania