Heavy-Tailed Principle Component Analysis

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the degradation of classical PCA under heavy-tailed data or impulsive noise, where reliance on second-order moments leads to poor performance. To tackle this in high-dimensional settings, the authors introduce a superstatistical model $\mathbf{X} = A^{1/2} \mathbf{G}$ and develop a log-loss-based PCA framework that remains well-defined even when the variance is infinite. Theoretically, they prove that the principal components obtained by their method coincide with those from standard PCA applied to the covariance matrix of the underlying Gaussian generator $\mathbf{G}$. This study presents the first unified framework for handling infinite-variance heavy-tailed distributions, establishing consistency between observed principal components and those of the latent Gaussian generator, and proposes a robust estimator for this covariance directly from heavy-tailed observations. Experiments demonstrate significant improvements over classical PCA under heavy-tailed and impulsive noise while maintaining competitive performance under Gaussian noise, with successful applications in tasks such as background denoising.

Technology Category

Application Category

📝 Abstract
Principal Component Analysis (PCA) is a cornerstone of dimensionality reduction, yet its classical formulation relies critically on second-order moments and is therefore fragile in the presence of heavy-tailed data and impulsive noise. While numerous robust PCA variants have been proposed, most either assume finite variance, rely on sparsity-driven decompositions, or address robustness through surrogate loss functions without a unified treatment of infinite-variance models. In this paper, we study PCA for high-dimensional data generated according to a superstatistical dependent model of the form $\mathbf{X} = A^{1/2}\mathbf{G}$, where $A$ is a positive random scalar and $\mathbf{G}$ is a Gaussian vector. This framework captures a wide class of heavy-tailed distributions, including multivariate $t$ and sub-Gaussian $α$-stable laws. We formulate PCA under a logarithmic loss, which remains well defined even when moments do not exist. Our main theoretical result shows that, under this loss, the principal components of the heavy-tailed observations coincide with those obtained by applying standard PCA to the covariance matrix of the underlying Gaussian generator. Building on this insight, we propose robust estimators for this covariance matrix directly from heavy-tailed data and compare them with the empirical covariance and Tyler's scatter estimator. Extensive experiments, including background denoising tasks, demonstrate that the proposed approach reliably recovers principal directions and significantly outperforms classical PCA in the presence of heavy-tailed and impulsive noise, while remaining competitive under Gaussian noise.
Problem

Research questions and friction points this paper is trying to address.

Heavy-tailed distributions
Principal Component Analysis
Robust PCA
Infinite variance
Impulsive noise
Innovation

Methods, ideas, or system contributions that make the work stand out.

Heavy-tailed PCA
Logarithmic loss
Superstatistical model
Robust covariance estimation
Infinite-variance distributions
🔎 Similar Papers
No similar papers found.
M
Mario Sayde
Electrical and Computer Engineering Department, American University of Beirut, Riad El-Solh, Beirut 1107 2020, Lebanon
C
Christopher Khater
Electrical and Computer Engineering Department, American University of Beirut, Riad El-Solh, Beirut 1107 2020, Lebanon
Jihad Fahs
Jihad Fahs
American University of Beirut
Information TheoryEstimation TheoryHeavy-Tailed DistributionsWireless Communications
Ibrahim Abou-Faycal
Ibrahim Abou-Faycal
Professor of Electrical and Computer Engineering, American University of Beirut, Lebanon
Information Theory