A Geometric Analysis of PCA

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

The excess risk of PCA is governed by the geometric structure of the data distribution—specifically, the eigenvalue spectrum decay and subspace curvature. To address this, we establish a central limit theorem for principal subspace estimation error on the Grassmann manifold. Crucially, we discover that the negative block Rayleigh quotient exhibits generalized autocovariance along specific geodesics—a novel geometric property enabling non-asymptotic risk analysis. Integrating tools from random matrix theory, differential geometry, and asymptotic statistical inference, we derive a tight non-asymptotic upper bound on excess risk that recovers exact asymptotic behavior and precisely characterizes the limiting distribution of reconstruction error. Our core contributions are threefold: (i) uncovering the intrinsic geometric nature of PCA risk; (ii) constructing the first asymptotic error theory for PCA with explicit geometric interpretation; and (iii) transcending classical spectral analysis limitations through a principled manifold-based framework.

Technology Category

Application Category

📝 Abstract

What property of the data distribution determines the excess risk of principal component analysis? In this paper, we provide a precise answer to this question. We establish a central limit theorem for the error of the principal subspace estimated by PCA, and derive the asymptotic distribution of its excess risk under the reconstruction loss. We obtain a non-asymptotic upper bound on the excess risk of PCA that recovers, in the large sample limit, our asymptotic characterization. Underlying our contributions is the following result: we prove that the negative block Rayleigh quotient, defined on the Grassmannian, is generalized self-concordant along geodesics emanating from its minimizer of maximum rotation less than $π/4$.

Problem

Research questions and friction points this paper is trying to address.

Determining data distribution properties affecting PCA excess risk

Establishing asymptotic distribution of PCA excess risk

Proving generalized self-concordance of negative block Rayleigh quotient

Innovation

Methods, ideas, or system contributions that make the work stand out.

Central limit theorem for PCA error analysis

Non-asymptotic bound on excess risk

Generalized self-concordance on Grassmannian manifold

🔎 Similar Papers

HUMAP: Hierarchical Uniform Manifold Approximation and Projection