🤖 AI Summary
This work addresses the prohibitively high running time of existing $(1+\varepsilon)$-approximation algorithms for $k$-median and $k$-means clustering in low-dimensional Euclidean space. By introducing a refined geometric decomposition scheme combined with dynamic programming, the authors significantly reduce the exponential dependence in the runtime from $2^{(1/\varepsilon)^{O(d^2)}}$ to $2^{\widetilde{O}((1/\varepsilon)^{d-1})}$. Moreover, under the Gap Exponential Time Hypothesis (Gap ETH), they establish a matching conditional lower bound, showing that no $(1+\varepsilon)$-approximation algorithm can run in time $2^{o((1/\varepsilon)^{d-1})} \cdot n^{O(1)}$. This result yields the first near-linear-time $(1+\varepsilon)$-approximation algorithms for these problems and provides nearly tight complexity bounds.
📝 Abstract
The $k$-median and $k$-means clustering objectives are classic objectives for modeling clustering in a metric space. Given a set of points in a metric space, the goal of the $k$-median (resp. $k$-means) problem is to find $k$ representative points so as to minimize the sum of the distances (resp. sum of squared distances) from each point to its closest representative. Cohen-Addad, Feldmann, and Saulpic [JACM'21] showed how to obtain a $(1+\varepsilon)$-factor approximation in low-dimensional Euclidean metric for both the $k$-median and $k$-means problems in near-linear time $2^{(1/\varepsilon)^{O(d^2)}} n \cdot \text{polylog}(n)$ (where $d$ is the dimension and $n$ is the number of input points).
We improve this running time to $2^{\tilde{O}(1/\varepsilon)^{d-1}} \cdot n \cdot \text{polylog}(n)$, and show an almost matching lower bound: under the Gap Exponential Time Hypothesis for 3-SAT, there is no $2^{{o}(1/\varepsilon^{d-1})} n^{O(1)}$ algorithm achieving a $(1+\varepsilon)$-approximation for $k$-means.