🤖 AI Summary
This work challenges the prevailing view that the “spike-bulk” eigenvalue structure of the Hessian in deep neural networks arises solely from imbalanced data covariance. Through theoretical analysis of deep linear networks, the authors demonstrate for the first time that even under perfectly balanced data covariance, the Hessian spectrum spontaneously develops a distinct spike-bulk bifurcation purely due to network depth. Moreover, they establish a linear relationship between the ratio of spike to bulk eigenvalues and the network depth. These findings reveal that depth alone can shape the spectral properties of the optimization landscape, thereby questioning explanations centered on data imbalance and highlighting the critical role of architecture in governing optimization dynamics.
📝 Abstract
The eigenvalue distribution of the Hessian matrix plays a crucial role in understanding the optimization landscape of deep neural networks. Prior work has attributed the well-documented ``bulk-and-spike''spectral structure, where a few dominant eigenvalues are separated from a bulk of smaller ones, to the imbalance in the data covariance matrix. In this work, we challenge this view by demonstrating that such spectral Bifurcation can arise purely from the network architecture, independent of data imbalance. Specifically, we analyze a deep linear network setup and prove that, even when the data covariance is perfectly balanced, the Hessian still exhibits a Bifurcation eigenvalue structure: a dominant cluster and a bulk cluster. Crucially, we establish that the ratio between dominant and bulk eigenvalues scales linearly with the network depth. This reveals that the spectral gap is strongly affected by the network architecture rather than solely by data distribution. Our results suggest that both model architecture and data characteristics should be considered when designing optimization algorithms for deep networks.