🤖 AI Summary
Parameter estimation in high-dimensional structured generalized linear models suffers from low efficiency, particularly under realistic design matrices exhibiting anisotropy and strong correlations.
Method: This paper introduces a novel spectral estimation framework based on Approximate Message Passing (AMP).
Contribution/Results: We provide the first exact asymptotic characterization of spectral estimators under correlated Gaussian designs. We identify a universally optimal covariance-adaptive preprocessing strategy, partially resolving a long-standing conjecture on optimal spectral estimation for rotationally invariant models. Theoretically and empirically, our approach substantially reduces sample complexity and achieves provably statistically optimal estimation accuracy—outperforming existing heuristic methods on canonical designs from computational imaging and genomics.
📝 Abstract
We consider the problem of parameter estimation in a high-dimensional generalized linear model. Spectral methods obtained via the principal eigenvector of a suitable data-dependent matrix provide a simple yet surprisingly effective solution. However, despite their wide use, a rigorous performance characterization, as well as a principled way to preprocess the data, are available only for unstructured (i.i.d. Gaussian and Haar orthogonal) designs. In contrast, real-world data matrices are highly structured and exhibit non-trivial correlations. To address the problem, we consider correlated Gaussian designs capturing the anisotropic nature of the features via a covariance matrix $Sigma$. Our main result is a precise asymptotic characterization of the performance of spectral estimators. This allows us to identify the optimal preprocessing that minimizes the number of samples needed for parameter estimation. Surprisingly, such preprocessing is universal across a broad set of designs, which partly addresses a conjecture on optimal spectral estimators for rotationally invariant models. Our principled approach vastly improves upon previous heuristic methods, including for designs common in computational imaging and genetics. The proposed methodology, based on approximate message passing, is broadly applicable and opens the way to the precise characterization of spiked matrices and of the corresponding spectral methods in a variety of settings.