Faster Approximation Algorithms for k-Center via Data Reduction

📅 2025-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
For the Euclidean $k$-center problem with large $k$, this paper proposes an $alpha$-coreset-based data reduction framework that constructs a compact subset of size $k cdot o(n)$, significantly accelerating approximation algorithms. The method leverages geometric coreset compression coupled with a novel consistent hashing scheme tailored for high-dimensional Euclidean spaces, enabling efficient geometric simplification and algorithmic speedup. Key contributions include: (1) the first near-linear-time $O(1)$-approximation algorithm for $k = n^c$ where $0 < c < 1$; (2) a dimension-agnostic consistent hashing construction supporting provably efficient coreset computation; and (3) a 4× speedup over the classic Gonzalez algorithm on real-world datasets, with negligible degradation in clustering cost. Theoretical analysis guarantees an optimal constant approximation ratio and near-linear time complexity, achieving a favorable trade-off between computational efficiency and solution accuracy.

Technology Category

Application Category

📝 Abstract
We study efficient algorithms for the Euclidean $k$-Center problem, focusing on the regime of large $k$. We take the approach of data reduction by considering $alpha$-coreset, which is a small subset $S$ of the dataset $P$ such that any $eta$-approximation on $S$ is an $(alpha + eta)$-approximation on $P$. We give efficient algorithms to construct coresets whose size is $k cdot o(n)$, which immediately speeds up existing approximation algorithms. Notably, we obtain a near-linear time $O(1)$-approximation when $k = n^c$ for any $0<c<1$. We validate the performance of our coresets on real-world datasets with large $k$, and we observe that the coreset speeds up the well-known Gonzalez algorithm by up to $4$ times, while still achieving similar clustering cost. Technically, one of our coreset results is based on a new efficient construction of consistent hashing with competitive parameters. This general tool may be of independent interest for algorithm design in high dimensional Euclidean spaces.
Problem

Research questions and friction points this paper is trying to address.

Efficient algorithms for Euclidean k-Center with large k
Constructing small coresets to speed up approximation
Achieving near-linear time constant-factor approximation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data reduction via alpha-coresets
Efficient coreset construction algorithm
Consistent hashing for high dimensions