Alternating Gradient Flows: A Theory of Feature Learning in Two-layer Neural Networks

📅 2025-06-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The mechanistic underpinnings of feature learning in neural networks remain poorly understood. This work introduces the Alternating Gradient Flow (AGF) theoretical framework, the first to systematically characterize the emergent feature dynamics in two-layer networks with small initialization—driven by alternating activation of dormant and active neurons during training. AGF unifies the analysis of feature learning order across fully connected linear networks, linear Transformers, and diagonal networks, and rigorously establishes the progressive acquisition of Fourier features in modular addition tasks. Methodologically, it integrates gradient flow analysis, inter-saddle-point dynamical modeling, SVD/PCA-based spectral decomposition, and asymptotic convergence proofs. The theory precisely predicts both the temporal sequence of feature activation and the magnitude of loss reduction; experiments confirm its high fidelity across diverse architectures. Furthermore, we rigorously prove that AGF converges to the true gradient flow in the small-initialization limit.

Technology Category

Application Category

📝 Abstract
What features neural networks learn, and how, remains an open question. In this paper, we introduce Alternating Gradient Flows (AGF), an algorithmic framework that describes the dynamics of feature learning in two-layer networks trained from small initialization. Prior works have shown that gradient flow in this regime exhibits a staircase-like loss curve, alternating between plateaus where neurons slowly align to useful directions and sharp drops where neurons rapidly grow in norm. AGF approximates this behavior as an alternating two-step process: maximizing a utility function over dormant neurons and minimizing a cost function over active ones. AGF begins with all neurons dormant. At each round, a dormant neuron activates, triggering the acquisition of a feature and a drop in the loss. AGF quantifies the order, timing, and magnitude of these drops, matching experiments across architectures. We show that AGF unifies and extends existing saddle-to-saddle analyses in fully connected linear networks and attention-only linear transformers, where the learned features are singular modes and principal components, respectively. In diagonal linear networks, we prove AGF converges to gradient flow in the limit of vanishing initialization. Applying AGF to quadratic networks trained to perform modular addition, we give the first complete characterization of the training dynamics, revealing that networks learn Fourier features in decreasing order of coefficient magnitude. Altogether, AGF offers a promising step towards understanding feature learning in neural networks.
Problem

Research questions and friction points this paper is trying to address.

Describes feature learning dynamics in two-layer neural networks
Unifies saddle-to-saddle analyses in linear networks and transformers
Characterizes training dynamics in quadratic networks learning Fourier features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Alternating Gradient Flows (AGF) framework
Two-step process: utility and cost optimization
Quantifies feature acquisition order and timing
🔎 Similar Papers
No similar papers found.