When does Gaussian equivalence fail and how to fix it: Non-universal behavior of random features with quadratic scaling

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Gaussian Equivalence Theory (GET) is widely used to simplify the analysis of linear predictors under nonlinear embeddings in high-dimensional random features (RF) models; however, it systematically fails—yielding severely biased predictions of training and test errors—under quadratic scaling, particularly when the target function depends on low-dimensional data projections (e.g., generalized linear models). Method: We propose the **Conditional Gaussian Equivalence Model (CGEM)**, which explicitly captures low-dimensional non-Gaussian structure to correct GET’s bias. Leveraging Wiener chaos expansion and a two-stage Lindeberg swapping technique, we rigorously derive exact asymptotic expressions for prediction errors. Results: Theory and experiments consistently demonstrate that CGEM maintains high accuracy precisely where GET breaks down. Our work is the first to establish the *conditional* and *non-universal* nature of Gaussian equivalence in high-dimensional empirical risk minimization, providing a new statistical paradigm for characterizing nonlinear learning in high dimensions.

Technology Category

Application Category

📝 Abstract
A major effort in modern high-dimensional statistics has been devoted to the analysis of linear predictors trained on nonlinear feature embeddings via empirical risk minimization (ERM). Gaussian equivalence theory (GET) has emerged as a powerful universality principle in this context: it states that the behavior of high-dimensional, complex features can be captured by Gaussian surrogates, which are more amenable to analysis. Despite its remarkable successes, numerical experiments show that this equivalence can fail even for simple embeddings -- such as polynomial maps -- under general scaling regimes. We investigate this breakdown in the setting of random feature (RF) models in the quadratic scaling regime, where both the number of features and the sample size grow quadratically with the data dimension. We show that when the target function depends on a low-dimensional projection of the data, such as generalized linear models, GET yields incorrect predictions. To capture the correct asymptotics, we introduce a Conditional Gaussian Equivalent (CGE) model, which can be viewed as appending a low-dimensional non-Gaussian component to an otherwise high-dimensional Gaussian model. This hybrid model retains the tractability of the Gaussian framework and accurately describes RF models in the quadratic scaling regime. We derive sharp asymptotics for the training and test errors in this setting, which continue to agree with numerical simulations even when GET fails. Our analysis combines general results on CLT for Wiener chaos expansions and a careful two-phase Lindeberg swapping argument. Beyond RF models and quadratic scaling, our work hints at a rich landscape of universality phenomena in high-dimensional ERM.
Problem

Research questions and friction points this paper is trying to address.

Investigates failure of Gaussian equivalence in random feature models with quadratic scaling.
Introduces Conditional Gaussian Equivalent model to correct non-universal behavior predictions.
Derives asymptotics for training and test errors when target depends on low-dimensional projections.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Conditional Gaussian Equivalent model for non-universal behavior
Combines low-dimensional non-Gaussian with high-dimensional Gaussian components
Uses CLT for Wiener chaos and Lindeberg swapping for asymptotics
🔎 Similar Papers
No similar papers found.
G
Garrett G. Wen
Department of Statistics and Data Science, Yale University
H
Hong Hu
Department of Electrical and System Engineering, Washington University in Saint Louis; Department of Statistics and Data Science, Washington University in Saint Louis
Yue M. Lu
Yue M. Lu
Gordon McKay Professor of Electrical Engineering and of Applied Mathematics, Harvard University
Signal and information processing
Zhou Fan
Zhou Fan
PhD Student in Computer Science, Harvard University
Theodor Misiakiewicz
Theodor Misiakiewicz
Assistant Professor, Yale University
machine learningprobability theorystatistics