Faster Approximation Algorithms for k-Center via Data Reduction

📅 2025-02-09

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

For the Euclidean $k$-center problem with large $k$, this paper proposes an $alpha$-coreset-based data reduction framework that constructs a compact subset of size $k cdot o(n)$, significantly accelerating approximation algorithms. The method leverages geometric coreset compression coupled with a novel consistent hashing scheme tailored for high-dimensional Euclidean spaces, enabling efficient geometric simplification and algorithmic speedup. Key contributions include: (1) the first near-linear-time $O(1)$-approximation algorithm for $k = n^c$ where $0 < c < 1$; (2) a dimension-agnostic consistent hashing construction supporting provably efficient coreset computation; and (3) a 4× speedup over the classic Gonzalez algorithm on real-world datasets, with negligible degradation in clustering cost. Theoretical analysis guarantees an optimal constant approximation ratio and near-linear time complexity, achieving a favorable trade-off between computational efficiency and solution accuracy.

Technology Category

Application Category

📝 Abstract

We study efficient algorithms for the Euclidean $k$-Center problem, focusing on the regime of large $k$. We take the approach of data reduction by considering $alpha$-coreset, which is a small subset $S$ of the dataset $P$ such that any $eta$-approximation on $S$ is an $(alpha + eta)$-approximation on $P$. We give efficient algorithms to construct coresets whose size is $k cdot o(n)$, which immediately speeds up existing approximation algorithms. Notably, we obtain a near-linear time $O(1)$-approximation when $k = n^c$ for any $0<c<1$. We validate the performance of our coresets on real-world datasets with large $k$, and we observe that the coreset speeds up the well-known Gonzalez algorithm by up to $4$ times, while still achieving similar clustering cost. Technically, one of our coreset results is based on a new efficient construction of consistent hashing with competitive parameters. This general tool may be of independent interest for algorithm design in high dimensional Euclidean spaces.

Problem

Research questions and friction points this paper is trying to address.

Efficient algorithms for Euclidean k-Center with large k

Constructing small coresets to speed up approximation

Achieving near-linear time constant-factor approximation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Data reduction via alpha-coresets

Efficient coreset construction algorithm

Consistent hashing for high dimensions

🔎 Similar Papers

Effective and General Distance Computation for Approximate Nearest Neighbor Search