🤖 AI Summary
For the Euclidean $k$-center problem with large $k$, this paper proposes an $alpha$-coreset-based data reduction framework that constructs a compact subset of size $k cdot o(n)$, significantly accelerating approximation algorithms. The method leverages geometric coreset compression coupled with a novel consistent hashing scheme tailored for high-dimensional Euclidean spaces, enabling efficient geometric simplification and algorithmic speedup. Key contributions include: (1) the first near-linear-time $O(1)$-approximation algorithm for $k = n^c$ where $0 < c < 1$; (2) a dimension-agnostic consistent hashing construction supporting provably efficient coreset computation; and (3) a 4× speedup over the classic Gonzalez algorithm on real-world datasets, with negligible degradation in clustering cost. Theoretical analysis guarantees an optimal constant approximation ratio and near-linear time complexity, achieving a favorable trade-off between computational efficiency and solution accuracy.
📝 Abstract
We study efficient algorithms for the Euclidean $k$-Center problem, focusing on the regime of large $k$. We take the approach of data reduction by considering $alpha$-coreset, which is a small subset $S$ of the dataset $P$ such that any $eta$-approximation on $S$ is an $(alpha + eta)$-approximation on $P$. We give efficient algorithms to construct coresets whose size is $k cdot o(n)$, which immediately speeds up existing approximation algorithms. Notably, we obtain a near-linear time $O(1)$-approximation when $k = n^c$ for any $0<c<1$. We validate the performance of our coresets on real-world datasets with large $k$, and we observe that the coreset speeds up the well-known Gonzalez algorithm by up to $4$ times, while still achieving similar clustering cost. Technically, one of our coreset results is based on a new efficient construction of consistent hashing with competitive parameters. This general tool may be of independent interest for algorithm design in high dimensional Euclidean spaces.