Optimal Bound for PCA with Outliers using Higher-Degree Voronoi Diagrams

📅 2024-08-13

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

Robust subspace estimation for Principal Component Analysis (PCA) in the presence of outliers remains computationally challenging, especially in high-dimensional, large-scale settings. Method: This paper introduces a novel geometric framework unifying computational and differential geometry: it pioneers the use of higher-order Voronoi diagrams in robust PCA, establishing a rigorous geometric correspondence between these diagrams and optimal subspace recovery on the Grassmann manifold; it further devises a manifold-aware exponentially efficient randomized sampling scheme. Contributions/Results: The approach derives a theoretical optimal error bound for subspace recovery and achieves global convergence within time complexity $n^{d+O(1)}mathrm{poly}(n,d)$. It converges to the optimal low-dimensional subspace with probability $(1-delta)^T$. By bridging discrete geometric structures with continuous manifold optimization, the method overcomes fundamental computational bottlenecks of conventional robust PCA, delivering substantial gains in both efficiency and accuracy for high-dimensional big-data applications.

Technology Category

Application Category

📝 Abstract

In this paper, we introduce new algorithms for Principal Component Analysis (PCA) with outliers. Utilizing techniques from computational geometry, specifically higher-degree Voronoi diagrams, we navigate to the optimal subspace for PCA even in the presence of outliers. This approach achieves an optimal solution with a time complexity of $n^{d+mathcal{O}(1)} ext{poly}(n,d)$. Additionally, we present a randomized algorithm with a complexity of $2^{mathcal{O}(r(d-r))} imes ext{poly}(n, d)$. This algorithm samples subspaces characterized in terms of a Grassmannian manifold. By employing such sampling method, we ensure a high likelihood of capturing the optimal subspace, with the success probability $(1 - delta)^T$. Where $delta$ represents the probability that a sampled subspace does not contain the optimal solution, and $T$ is the number of subspaces sampled, proportional to $2^{r(d-r)}$. Our use of higher-degree Voronoi diagrams and Grassmannian based sampling offers a clearer conceptual pathway and practical advantages, particularly in handling large datasets or higher-dimensional settings.

Problem

Research questions and friction points this paper is trying to address.

PCA with outliers using Voronoi diagrams

Optimal subspace computation with polynomial time

Randomized Grassmannian sampling for high-dimensional data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses higher-degree Voronoi diagrams for PCA

Samples subspaces from Grassmannian manifold

Achieves optimal solution with polynomial complexity

🔎 Similar Papers

HUMAP: Hierarchical Uniform Manifold Approximation and Projection