🤖 AI Summary
This paper studies the dynamic Euclidean $k$-means clustering problem: maintaining $k$ centers in real time to minimize the $k$-means objective under an insertion-deletion stream of data points. We propose the first consistent hashing scheme achieving $mathrm{poly}(d)$ update time in $d$-dimensional Euclidean space, coupled with a novel geometric data structure that supports efficient clustering maintenance and adaptive grid decomposition. Our algorithm achieves a $(1+varepsilon)$-approximation ratio, with amortized update time $ ilde{O}(k cdot mathrm{poly}(d, 1/varepsilon))$ and $ ilde{O}(1)$ center replacements per update—breaking the high-dimensional efficiency bottleneck of prior dynamic clustering methods. To our knowledge, this is the first dynamic $k$-means algorithm whose theoretical guarantees—simultaneously on approximation ratio, update time, and stability (i.e., center stability)—are near-optimal.
📝 Abstract
We consider the fundamental Euclidean $k$-means clustering problem in a dynamic setting, where the input $X subseteq mathbb{R}^d$ evolves over time via a sequence of point insertions/deletions. We have to explicitly maintain a solution (a set of $k$ centers) $S subseteq mathbb{R}^d$ throughout these updates, while minimizing the approximation ratio, the update time (time taken to handle a point insertion/deletion) and the recourse (number of changes made to the solution $S$) of the algorithm.
We present a dynamic algorithm for this problem with $ ext{poly}(1/ε)$-approximation ratio, $ ilde{O}(k^ε)$ update time and $ ilde{O}(1)$ recourse. In the general regime, where the dimension $d$ cannot be assumed to be a fixed constant, our algorithm has almost optimal guarantees across all these three parameters. Indeed, improving our update time or approximation ratio would imply beating the state-of-the-art static algorithm for this problem (which is widely believed to be the best possible), and the recourse of any dynamic algorithm must be $Ω(1)$.
We obtain our result by building on top of the recent work of [Bhattacharya, Costa, Farokhnejad; STOC'25], which gave a near-optimal dynamic algorithm for $k$-means in general metric spaces (as opposed to in the Euclidean setting). Along the way, we design several novel geometric data structures that are of independent interest. Specifically, one of our main contributions is designing the first consistent hashing scheme [Czumaj, Jiang, Krauthgamer, Veselý, Yang; FOCS'22] that achieves $ ext{poly}(d)$ running time per point evaluation with competitive parameters.