🤖 AI Summary
Robust subspace estimation for Principal Component Analysis (PCA) in the presence of outliers remains computationally challenging, especially in high-dimensional, large-scale settings.
Method: This paper introduces a novel geometric framework unifying computational and differential geometry: it pioneers the use of higher-order Voronoi diagrams in robust PCA, establishing a rigorous geometric correspondence between these diagrams and optimal subspace recovery on the Grassmann manifold; it further devises a manifold-aware exponentially efficient randomized sampling scheme.
Contributions/Results: The approach derives a theoretical optimal error bound for subspace recovery and achieves global convergence within time complexity $n^{d+O(1)}mathrm{poly}(n,d)$. It converges to the optimal low-dimensional subspace with probability $(1-delta)^T$. By bridging discrete geometric structures with continuous manifold optimization, the method overcomes fundamental computational bottlenecks of conventional robust PCA, delivering substantial gains in both efficiency and accuracy for high-dimensional big-data applications.
📝 Abstract
In this paper, we introduce new algorithms for Principal Component Analysis (PCA) with outliers. Utilizing techniques from computational geometry, specifically higher-degree Voronoi diagrams, we navigate to the optimal subspace for PCA even in the presence of outliers. This approach achieves an optimal solution with a time complexity of $n^{d+mathcal{O}(1)} ext{poly}(n,d)$. Additionally, we present a randomized algorithm with a complexity of $2^{mathcal{O}(r(d-r))} imes ext{poly}(n, d)$. This algorithm samples subspaces characterized in terms of a Grassmannian manifold. By employing such sampling method, we ensure a high likelihood of capturing the optimal subspace, with the success probability $(1 - delta)^T$. Where $delta$ represents the probability that a sampled subspace does not contain the optimal solution, and $T$ is the number of subspaces sampled, proportional to $2^{r(d-r)}$. Our use of higher-degree Voronoi diagrams and Grassmannian based sampling offers a clearer conceptual pathway and practical advantages, particularly in handling large datasets or higher-dimensional settings.