🤖 AI Summary
This study investigates how deep neural networks evolve their feature representations during training. To this end, the authors propose a feature-centric analytical framework that characterizes feature dynamics through the weight Gram matrix, interprets gradient descent as implicitly driving the evolution of a “virtual covariance,” and introduces “target linearity” to quantify the degree of linear alignment between features and target labels. This work establishes, for the first time, a theoretical connection between the weight Gram matrix and feature linearization, offering a unified explanation for diverse empirical phenomena such as neural collapse and linear interpolation in generative models. Both theoretical analysis and experiments demonstrate that deep networks progressively transform their internal representations into a target-aligned linear structure across layers during training, providing a cohesive linearization-based framework for understanding these observations.
📝 Abstract
Understanding how deep neural networks learn representations remains a central challenge in machine learning theory. In this work, we propose a feature-centric framework for analyzing neural network training by relating weight updates to feature evolution. We introduce a simple identity, the Feature Learning Equation, which identifies the weight Gram matrix as the key object capturing feature dynamics. This enables us to interpret gradient descent as implicitly inducing a hypothetical evolution of features, whose covariance structure - termed the Virtual Covariance - characterizes how representations evolve during training. Building on this perspective, we introduce Target Linearity, a measure quantifying the linear alignment between features and targets. By analyzing the training and layer-wise dynamics, we show that deep networks learn to sequentially transform representations toward target-linear structure. This linearization perspective provides a unified interpretation of several empirical phenomena, including Neural Collapse and linear interpolation in generative models.