Holdout cross-validation for large non-Gaussian covariance matrix estimation using Weingarten calculus

📅 2025-09-17
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
This work investigates the Frobenius error behavior of holdout cross-validation for large covariance matrix estimation under non-Gaussian, high-dimensional settings. We consider a rotationally invariant multiplicative noise model and—extending prior Gaussian analyses—combine Weingarten calculus with the Ledoit–PĂ©chĂ© eigenvalue formula to rigorously derive an analytical expression for the expected estimation error under general non-Gaussian distributions. We propose linear and quadratic shrinkage estimators and validate our theoretical predictions via inverse Wishart modeling and Monte Carlo simulations. Key findings include: (i) the optimal training-to-test split ratio asymptotically scales with the square root of the dimension; (ii) quadratic shrinkage substantially reduces estimation error compared to linear shrinkage; and (iii) the noise kurtosis (fourth moment) governs both the curvature of the error curve and the location of the optimal split. This work provides the first quantitative, non-Gaussian theoretical framework characterizing cross-validation in high-dimensional covariance estimation.

Technology Category

Application Category

📝 Abstract
Cross-validation is one of the most widely used methods for model selection and evaluation; its efficiency for large covariance matrix estimation appears robust in practice, but little is known about the theoretical behavior of its error. In this paper, we derive the expected Frobenius error of the holdout method, a particular cross-validation procedure that involves a single train and test split, for a generic rotationally invariant multiplicative noise model, therefore extending previous results to non-Gaussian data distributions. Our approach involves using the Weingarten calculus and the Ledoit-Péché formula to derive the oracle eigenvalues in the high-dimensional limit. When the population covariance matrix follows an inverse Wishart distribution, we approximate the expected holdout error, first with a linear shrinkage, then with a quadratic shrinkage to approximate the oracle eigenvalues. Under the linear approximation, we find that the optimal train-test split ratio is proportional to the square root of the matrix dimension. Then we compute Monte Carlo simulations of the holdout error for different distributions of the norm of the noise, such as the Gaussian, Student, and Laplace distributions and observe that the quadratic approximation yields a substantial improvement, especially around the optimal train-test split ratio. We also observe that a higher fourth-order moment of the Euclidean norm of the noise vector sharpens the holdout error curve near the optimal split and lowers the ideal train-test ratio, making the choice of the train-test ratio more important when performing the holdout method.
Problem

Research questions and friction points this paper is trying to address.

Estimating large non-Gaussian covariance matrices using holdout cross-validation
Deriving theoretical error bounds for holdout method with rotational invariance
Determining optimal train-test split ratios for covariance estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Weingarten calculus for non-Gaussian covariance estimation
Quadratic shrinkage approximating oracle eigenvalues
Optimal train-test split proportional to dimension
🔎 Similar Papers
No similar papers found.
L
Lamia Lamrani
Laboratoire de Mathématiques et Informatique pour la Complexité et les SystÚmes, Université Paris-Saclay, CentraleSupélec, 91192 Gif-sur-Yvette, France
BenoĂźt Collins
BenoĂźt Collins
Professor, Kyoto University
mathematics
Jean-Philippe Bouchaud
Jean-Philippe Bouchaud
Head of Research, CFM
Statistical mechanicsDisordered systemsRandom MatricesQuantitative FinanceAgent Based Models