Local Search for Clustering in Almost-linear Time

📅 2025-04-04

📈 Citations: 0

✨ Influential: 0

career value

260K/year

🤖 AI Summary

This paper addresses the longstanding trade-off between approximation ratio and time complexity in Euclidean $k$-means clustering. We propose the first local search algorithm achieving both an $O(c)$-approximation guarantee and $ ilde{O}(n^{1+1/c})$ nearly-linear runtime. Our method introduces: (1) a novel 1-swap-based local search framework; (2) a unified swap scoring rule adaptable to diverse metric spaces—including $ell_p$, doubling, and Jaccard metrics; and (3) an efficient neighbor retrieval scheme integrating sparse spanners with approximate nearest neighbor structures. Compared to the prior best $O(c^6)$-approximation algorithm, our approach improves the approximation ratio to $O(c)$ while preserving near-linear scalability. This constitutes the first constant-factor approximation for Euclidean $k$-means with nearly-linear time complexity, breaking the classical approximation–efficiency barrier.

Technology Category

Application Category

📝 Abstract

We propose the first emph{local search} algorithm for Euclidean clustering that attains an $O(1)$-approximation in almost-linear time. Specifically, for Euclidean $k$-Means, our algorithm achieves an $O(c)$-approximation in $ ilde{O}(n^{1 + 1 / c})$ time, for any constant $c ge 1$, maintaining the same running time as the previous (non-local-search-based) approach [la Tour and Saulpic, arXiv'2407.11217] while improving the approximation factor from $O(c^{6})$ to $O(c)$. The algorithm generalizes to any metric space with sparse spanners, delivering efficient constant approximation in $ell_p$ metrics, doubling metrics, Jaccard metrics, etc. This generality derives from our main technical contribution: a local search algorithm on general graphs that obtains an $O(1)$-approximation in almost-linear time. We establish this through a new $1$-swap local search framework featuring a novel swap selection rule. At a high level, this rule ``scores'' every possible swap, based on both its modification to the clustering and its improvement to the clustering objective, and then selects those high-scoring swaps. To implement this, we design a new data structure for maintaining approximate nearest neighbors with amortized guarantees tailored to our framework.

Problem

Research questions and friction points this paper is trying to address.

Develops local search for Euclidean clustering in near-linear time

Improves approximation factor for k-Means from O(c^6) to O(c)

Extends algorithm to various metrics using sparse spanners

Innovation

Methods, ideas, or system contributions that make the work stand out.

Local search algorithm for Euclidean clustering

1-swap local search with novel scoring rule

Data structure for approximate nearest neighbors

🔎 Similar Papers

No similar papers found.