Intrinsic training dynamics of deep neural networks

📅 2025-08-10

📈 Citations: 0

✨ Influential: 0

career value

245K/year

🤖 AI Summary

This work addresses the structured representation of implicit bias in deep neural networks—specifically, whether high-dimensional gradient flows in deep neural networks can be characterized by low-dimensional intrinsic dynamics. Method: We formalize the notion of “intrinsic dynamics” and derive its necessary and sufficient conditions for existence. Leveraging path-lifting maps—a novel application to ReLU networks—we construct architecture-coupled, low-dimensional equivalent dynamical systems and generalize balanced initialization to relaxed equilibrium conditions. Contributions/Results: Rigorously grounded in gradient flow analysis, kernel inclusion criteria, and conservation law theory, we prove that, for ReLU networks and linear neural ODEs, training dynamics under arbitrary initialization are fully determined by a low-dimensional variable (z = phi( heta)) and its initial value. In the linear limit, we explicitly derive closed-form dynamical equations. Our results uncover a fundamental connection between architecture-induced implicit bias and conserved quantities, revealing how network structure constrains optimization trajectories through invariant principles.

Technology Category

Application Category

📝 Abstract

A fundamental challenge in the theory of deep learning is to understand whether gradient-based training in high-dimensional parameter spaces can be captured by simpler, lower-dimensional structures, leading to so-called implicit bias. As a stepping stone, we study when a gradient flow on a high-dimensional variable $θ$ implies an intrinsic gradient flow on a lower-dimensional variable $z = φ(θ)$, for an architecture-related function $φ$. We express a so-called intrinsic dynamic property and show how it is related to the study of conservation laws associated with the factorization $φ$. This leads to a simple criterion based on the inclusion of kernels of linear maps which yields a necessary condition for this property to hold. We then apply our theory to general ReLU networks of arbitrary depth and show that, for any initialization, it is possible to rewrite the flow as an intrinsic dynamic in a lower dimension that depends only on $z$ and the initialization, when $φ$ is the so-called path-lifting. In the case of linear networks with $φ$ the product of weight matrices, so-called balanced initializations are also known to enable such a dimensionality reduction; we generalize this result to a broader class of {em relaxed balanced} initializations, showing that, in certain configurations, these are the emph{only} initializations that ensure the intrinsic dynamic property. Finally, for the linear neural ODE associated with the limit of infinitely deep linear networks, with relaxed balanced initialization, we explicitly express the corresponding intrinsic dynamics.

Problem

Research questions and friction points this paper is trying to address.

Understand gradient flow dimensionality reduction in deep networks

Establish intrinsic dynamics criterion via kernel inclusion

Generalize balanced initialization for ReLU and linear networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Intrinsic gradient flow on lower-dimensional variable

Path-lifting for ReLU networks dimensionality reduction

Relaxed balanced initializations enable intrinsic dynamics

🔎 Similar Papers

Does SGD really happen in tiny subspaces?