🤖 AI Summary
Clustering on high-dimensional non-Euclidean manifolds—such as the symmetric positive definite (SPD) matrix manifold—is computationally expensive and poorly scalable due to reliance on geodesic computations and iterative intrinsic optimization. To address this, we propose a fast $k$-means framework based on the $p$-Fréchet mapping. Our method constructs a reference point set to efficiently embed manifold-valued data into a low-dimensional Euclidean space, where standard $k$-means is then applied. Compared with conventional intrinsic manifold clustering algorithms, our approach bypasses costly geodesic calculations and iterative manifold optimization, achieving up to a 100× speedup while preserving clustering accuracy. It remains robust and stable under challenging conditions—including high dimensionality, large-scale datasets, and noise corruption. Our key contribution is the first systematic application of the $p$-Fréchet mapping for scalable, theoretically grounded embedding of manifold data into Euclidean space, establishing a new paradigm for efficient and principled non-Euclidean clustering.
📝 Abstract
We introduce a novel, efficient framework for clustering data on high-dimensional, non-Euclidean manifolds that overcomes the computational challenges associated with standard intrinsic methods. The key innovation is the use of the $p$-Fr'{e}chet map $F^p : mathcal{M} o mathbb{R}^ell$ -- defined on a generic metric space $mathcal{M}$ -- which embeds the manifold data into a lower-dimensional Euclidean space $mathbb{R}^ell$ using a set of reference points ${r_i}_{i=1}^ell$, $r_i in mathcal{M}$. Once embedded, we can efficiently and accurately apply standard Euclidean clustering techniques such as k-means. We rigorously analyze the mathematical properties of $F^p$ in the Euclidean space and the challenging manifold of $n imes n$ symmetric positive definite matrices $mathit{SPD}(n)$. Extensive numerical experiments using synthetic and real $mathit{SPD}(n)$ data demonstrate significant performance gains: our method reduces runtime by up to two orders of magnitude compared to intrinsic manifold-based approaches, all while maintaining high clustering accuracy, including scenarios where existing alternative methods struggle or fail.