🤖 AI Summary
This paper addresses the longstanding trade-off between approximation ratio and time complexity in Euclidean $k$-means clustering. We propose the first local search algorithm achieving both an $O(c)$-approximation guarantee and $ ilde{O}(n^{1+1/c})$ nearly-linear runtime. Our method introduces: (1) a novel 1-swap-based local search framework; (2) a unified swap scoring rule adaptable to diverse metric spaces—including $ell_p$, doubling, and Jaccard metrics; and (3) an efficient neighbor retrieval scheme integrating sparse spanners with approximate nearest neighbor structures. Compared to the prior best $O(c^6)$-approximation algorithm, our approach improves the approximation ratio to $O(c)$ while preserving near-linear scalability. This constitutes the first constant-factor approximation for Euclidean $k$-means with nearly-linear time complexity, breaking the classical approximation–efficiency barrier.
📝 Abstract
We propose the first emph{local search} algorithm for Euclidean clustering that attains an $O(1)$-approximation in almost-linear time. Specifically, for Euclidean $k$-Means, our algorithm achieves an $O(c)$-approximation in $ ilde{O}(n^{1 + 1 / c})$ time, for any constant $c ge 1$, maintaining the same running time as the previous (non-local-search-based) approach [la Tour and Saulpic, arXiv'2407.11217] while improving the approximation factor from $O(c^{6})$ to $O(c)$. The algorithm generalizes to any metric space with sparse spanners, delivering efficient constant approximation in $ell_p$ metrics, doubling metrics, Jaccard metrics, etc. This generality derives from our main technical contribution: a local search algorithm on general graphs that obtains an $O(1)$-approximation in almost-linear time. We establish this through a new $1$-swap local search framework featuring a novel swap selection rule. At a high level, this rule ``scores'' every possible swap, based on both its modification to the clustering and its improvement to the clustering objective, and then selects those high-scoring swaps. To implement this, we design a new data structure for maintaining approximate nearest neighbors with amortized guarantees tailored to our framework.