🤖 AI Summary
This work investigates the continuity and stability of K-means clustering under the measure Gromov–Hausdorff topology in settings where both the underlying metric and the distance function are unknown and must be estimated. By developing a unified theoretical framework that integrates Fréchet K-means, Voronoi cell stability analysis, and metric learning techniques, the paper establishes, for the first time, consistency guarantees for K-means in diverse contexts—including Isomap embeddings, Fermat geodesic distances, diffusion distances, and learned-basis Wasserstein metrics. The results not only encompass statistical consistency under manifold-induced geometric distances and probabilistic metrics but also extend to non-standard inference scenarios such as first-passage percolation and discrete approximations of length spaces.
📝 Abstract
We study the Fréchet {\it k-}means of a metric measure space when both the measure and the distance are unknown and have to be estimated. We prove a general result that states that the {\it k-}means are continuous with respect to the measured Gromov-Hausdorff topology. In this situation, we also prove a stability result for the Voronoi clusters they determine. We do not assume uniqueness of the set of {\it k-}means, but when it is unique, the results are stronger. {This framework provides a unified approach to proving consistency for a wide range of metric learning procedures. As concrete applications, we obtain new consistency results for several important estimators that were previously unestablished, even when $k=1$. These include {\it k-}means based on: (i) Isomap and Fermat geodesic distances on manifolds, (ii) difussion distances, (iii) Wasserstein distances computed with respect to learned ground metrics. Finally, we consider applications beyond the statistical inference paradigm like (iv) first passage percolation and (v) discrete approximations of length spaces.}