š¤ AI Summary
This work investigates the local geometric structure of the empirical risk function in high-dimensional Gaussian mixture models, focusing on spectral properties of the Hessian and (generalized) Fisher information matrices. Methodologically, it employs random matrix theory and proportional asymptotics to establish, for the first time, an exact correspondence between the parameter Gram matrix and the empirical spectral distribution. It rigorously characterizes both static spectral phase transitionsāsuch as the limiting spectral support and existence of outliersāand dynamic spectral phase transitionsānamely, bifurcations in eigenvalue/eigenvector evolution during training. Furthermore, an effective dynamical ODE system is constructed, yielding analytical limit formulas for outlier eigenvalues and their associated eigenvectors. Empirical validation across canonical settingsāincluding multiclass logistic regressionādemonstrates pronounced spectral divergence between the Hessian and Fisher information matrix, revealing geometry-driven phase transition mechanisms in high-dimensional nonconvex optimization.
š Abstract
We study the local geometry of empirical risks in high dimensions via the spectral theory of their Hessian and information matrices. We focus on settings where the data, $(Y_ell)_{ell =1}^nin mathbb R^d$, are i.i.d. draws of a $k$-component Gaussian mixture model, and the loss depends on the projection of the data into a fixed number of vectors, namely $mathbf{x}^ op Y$, where $mathbf{x}in mathbb{R}^{d imes C}$ are the parameters, and $C$ need not equal $k$. This setting captures a broad class of problems such as classification by one and two-layer networks and regression on multi-index models. We prove exact formulas for the limits of the empirical spectral distribution and outlier eigenvalues and eigenvectors of such matrices in the proportional asymptotics limit, where the number of samples and dimension $n,d oinfty$ and $n/d=phi in (0,infty)$. These limits depend on the parameters $mathbf{x}$ only through the summary statistic of the $(C+k) imes (C+k)$ Gram matrix of the parameters and class means, $mathbf{G} = (mathbf{x},mathbf{mu})^ op(mathbf{x},mathbf{mu})$. It is known that under general conditions, when $mathbf{x}$ is trained by stochastic gradient descent, the evolution of these same summary statistics along training converges to the solution of an autonomous system of ODEs, called the effective dynamics. This enables us to connect the spectral theory to the training dynamics. We demonstrate our general results by analyzing the effective spectrum along the effective dynamics in the case of multi-class logistic regression. In this setting, the empirical Hessian and information matrices have substantially different spectra, each with their own static and even dynamical spectral transitions.