Fast $k$-means clustering in Riemannian manifolds via Fr'{e}chet maps: Applications to large-dimensional SPD matrices

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Clustering on high-dimensional non-Euclidean manifolds—such as the symmetric positive definite (SPD) matrix manifold—is computationally expensive and poorly scalable due to reliance on geodesic computations and iterative intrinsic optimization. To address this, we propose a fast $k$-means framework based on the $p$-Fréchet mapping. Our method constructs a reference point set to efficiently embed manifold-valued data into a low-dimensional Euclidean space, where standard $k$-means is then applied. Compared with conventional intrinsic manifold clustering algorithms, our approach bypasses costly geodesic calculations and iterative manifold optimization, achieving up to a 100× speedup while preserving clustering accuracy. It remains robust and stable under challenging conditions—including high dimensionality, large-scale datasets, and noise corruption. Our key contribution is the first systematic application of the $p$-Fréchet mapping for scalable, theoretically grounded embedding of manifold data into Euclidean space, establishing a new paradigm for efficient and principled non-Euclidean clustering.

Technology Category

Application Category

📝 Abstract
We introduce a novel, efficient framework for clustering data on high-dimensional, non-Euclidean manifolds that overcomes the computational challenges associated with standard intrinsic methods. The key innovation is the use of the $p$-Fr'{e}chet map $F^p : mathcal{M} o mathbb{R}^ell$ -- defined on a generic metric space $mathcal{M}$ -- which embeds the manifold data into a lower-dimensional Euclidean space $mathbb{R}^ell$ using a set of reference points ${r_i}_{i=1}^ell$, $r_i in mathcal{M}$. Once embedded, we can efficiently and accurately apply standard Euclidean clustering techniques such as k-means. We rigorously analyze the mathematical properties of $F^p$ in the Euclidean space and the challenging manifold of $n imes n$ symmetric positive definite matrices $mathit{SPD}(n)$. Extensive numerical experiments using synthetic and real $mathit{SPD}(n)$ data demonstrate significant performance gains: our method reduces runtime by up to two orders of magnitude compared to intrinsic manifold-based approaches, all while maintaining high clustering accuracy, including scenarios where existing alternative methods struggle or fail.
Problem

Research questions and friction points this paper is trying to address.

Clustering high-dimensional data on non-Euclidean manifolds efficiently
Overcoming computational challenges of intrinsic manifold clustering methods
Enabling fast k-means clustering for large SPD matrices
Innovation

Methods, ideas, or system contributions that make the work stand out.

Embedding manifold data into Euclidean space
Using Fréchet maps for dimensionality reduction
Applying k-means clustering after transformation
🔎 Similar Papers
No similar papers found.