Local geometry of high-dimensional mixture models: Effective spectral theory and dynamical transitions

šŸ“… 2025-02-21
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
This work investigates the local geometric structure of the empirical risk function in high-dimensional Gaussian mixture models, focusing on spectral properties of the Hessian and (generalized) Fisher information matrices. Methodologically, it employs random matrix theory and proportional asymptotics to establish, for the first time, an exact correspondence between the parameter Gram matrix and the empirical spectral distribution. It rigorously characterizes both static spectral phase transitions—such as the limiting spectral support and existence of outliers—and dynamic spectral phase transitions—namely, bifurcations in eigenvalue/eigenvector evolution during training. Furthermore, an effective dynamical ODE system is constructed, yielding analytical limit formulas for outlier eigenvalues and their associated eigenvectors. Empirical validation across canonical settings—including multiclass logistic regression—demonstrates pronounced spectral divergence between the Hessian and Fisher information matrix, revealing geometry-driven phase transition mechanisms in high-dimensional nonconvex optimization.

Technology Category

Application Category

šŸ“ Abstract
We study the local geometry of empirical risks in high dimensions via the spectral theory of their Hessian and information matrices. We focus on settings where the data, $(Y_ell)_{ell =1}^nin mathbb R^d$, are i.i.d. draws of a $k$-component Gaussian mixture model, and the loss depends on the projection of the data into a fixed number of vectors, namely $mathbf{x}^ op Y$, where $mathbf{x}in mathbb{R}^{d imes C}$ are the parameters, and $C$ need not equal $k$. This setting captures a broad class of problems such as classification by one and two-layer networks and regression on multi-index models. We prove exact formulas for the limits of the empirical spectral distribution and outlier eigenvalues and eigenvectors of such matrices in the proportional asymptotics limit, where the number of samples and dimension $n,d oinfty$ and $n/d=phi in (0,infty)$. These limits depend on the parameters $mathbf{x}$ only through the summary statistic of the $(C+k) imes (C+k)$ Gram matrix of the parameters and class means, $mathbf{G} = (mathbf{x},mathbf{mu})^ op(mathbf{x},mathbf{mu})$. It is known that under general conditions, when $mathbf{x}$ is trained by stochastic gradient descent, the evolution of these same summary statistics along training converges to the solution of an autonomous system of ODEs, called the effective dynamics. This enables us to connect the spectral theory to the training dynamics. We demonstrate our general results by analyzing the effective spectrum along the effective dynamics in the case of multi-class logistic regression. In this setting, the empirical Hessian and information matrices have substantially different spectra, each with their own static and even dynamical spectral transitions.
Problem

Research questions and friction points this paper is trying to address.

Analyzes local geometry of high-dimensional Gaussian mixture models
Studies spectral properties of Hessian and information matrices
Connects spectral theory to training dynamics in machine learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral theory analyzes Hessian and information matrices
Exact formulas for empirical spectral distribution limits
Connects spectral theory to training dynamics via ODEs