🤖 AI Summary
Gaussian Equivalence Theory (GET) is widely used to simplify the analysis of linear predictors under nonlinear embeddings in high-dimensional random features (RF) models; however, it systematically fails—yielding severely biased predictions of training and test errors—under quadratic scaling, particularly when the target function depends on low-dimensional data projections (e.g., generalized linear models).
Method: We propose the **Conditional Gaussian Equivalence Model (CGEM)**, which explicitly captures low-dimensional non-Gaussian structure to correct GET’s bias. Leveraging Wiener chaos expansion and a two-stage Lindeberg swapping technique, we rigorously derive exact asymptotic expressions for prediction errors.
Results: Theory and experiments consistently demonstrate that CGEM maintains high accuracy precisely where GET breaks down. Our work is the first to establish the *conditional* and *non-universal* nature of Gaussian equivalence in high-dimensional empirical risk minimization, providing a new statistical paradigm for characterizing nonlinear learning in high dimensions.
📝 Abstract
A major effort in modern high-dimensional statistics has been devoted to the analysis of linear predictors trained on nonlinear feature embeddings via empirical risk minimization (ERM). Gaussian equivalence theory (GET) has emerged as a powerful universality principle in this context: it states that the behavior of high-dimensional, complex features can be captured by Gaussian surrogates, which are more amenable to analysis. Despite its remarkable successes, numerical experiments show that this equivalence can fail even for simple embeddings -- such as polynomial maps -- under general scaling regimes. We investigate this breakdown in the setting of random feature (RF) models in the quadratic scaling regime, where both the number of features and the sample size grow quadratically with the data dimension. We show that when the target function depends on a low-dimensional projection of the data, such as generalized linear models, GET yields incorrect predictions. To capture the correct asymptotics, we introduce a Conditional Gaussian Equivalent (CGE) model, which can be viewed as appending a low-dimensional non-Gaussian component to an otherwise high-dimensional Gaussian model. This hybrid model retains the tractability of the Gaussian framework and accurately describes RF models in the quadratic scaling regime. We derive sharp asymptotics for the training and test errors in this setting, which continue to agree with numerical simulations even when GET fails. Our analysis combines general results on CLT for Wiener chaos expansions and a careful two-phase Lindeberg swapping argument. Beyond RF models and quadratic scaling, our work hints at a rich landscape of universality phenomena in high-dimensional ERM.