Regime Change Hypothesis: Foundations for Decoupled Dynamics in Neural Network Training

📅 2026-02-09
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited understanding of training dynamics in deep neural networks with ReLU activations, particularly regarding how activation patterns evolve during optimization. The study proposes that training unfolds over two distinct time scales: an initial phase characterized by rapid changes in activation patterns, followed by a later phase where weights are fine-tuned within stable activation regions. Leveraging a geometric perspective, the authors develop a theoretical framework for activation pattern stability, supported by measure-theoretic analysis of local stability. They empirically track activation and weight trajectories across fully connected, convolutional, and Transformer architectures, revealing that activation patterns stabilize approximately three times earlier than weight updates converge. This consistent observation—“activations converge first, weights fine-tune later”—provides a foundational insight for staged optimization strategies in deep learning.

Technology Category

Application Category

📝 Abstract
Despite the empirical success of DNN, their internal training dynamics remain difficult to characterize. In ReLU-based models, the activation pattern induced by a given input determines the piecewise-linear region in which the network behaves affinely. Motivated by this geometry, we investigate whether training exhibits a two-timescale behavior: an early stage with substantial changes in activation patterns and a later stage where weight updates predominantly refine the model within largely stable activation regimes. We first prove a local stability property: outside measure-zero sets of parameters and inputs, sufficiently small parameter perturbations preserve the activation pattern of a fixed input, implying locally affine behavior within activation regions. We then empirically track per-iteration changes in weights and activation patterns across fully-connected and convolutional architectures, as well as Transformer-based models, where activation patterns are recorded in the ReLU feed-forward (MLP/FFN) submodules, using fixed validation subsets. Across the evaluated settings, activation-pattern changes decay 3 times earlier than weight-update magnitudes, showing that late-stage training often proceeds within relatively stable activation regimes. These findings provide a concrete, architecture-agnostic instrument for monitoring training dynamics and motivate further study of decoupled optimization strategies for piecewise-linear networks. For reproducibility, code and experiment configurations will be released upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

training dynamics
activation patterns
piecewise-linear networks
two-timescale behavior
ReLU networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Regime Change Hypothesis
activation pattern stability
two-timescale training dynamics
piecewise-linear networks
decoupled optimization
🔎 Similar Papers
No similar papers found.
C
Cristian PĂŠrez-Corral
Dept. of Computer Engineering, Universitat Politècnica de València, Valencia, Spain
A
Alberto FernĂĄndez-HernĂĄndez
Dept. of Computer Engineering, Universitat Politècnica de València, Valencia, Spain
J
Jose I. Mestre
Dept. of Computer Engineering, Universitat Jaume I, CastellĂłn, Spain
Manuel F. Dolz
Manuel F. Dolz
Universitat Jaume I
High Performance ComputingEnergy EfficiencyParallel Programming ModelsPerformance AnalysisDeep Learning
Jose Duato
Jose Duato
Universitat Politècnica de València
Redes de interconexiĂłnmultiprocesadores
Enrique S. Quintana-OrtĂ­
Enrique S. Quintana-OrtĂ­
Universitat Politècnica de València, Spain