🤖 AI Summary
This paper addresses the $k$-center clustering problem in high-dimensional spaces under severe class imbalance. Existing methods suffer from theoretical gaps and computational inefficiency. To bridge this gap, we propose the first core-set construction method with provable error bounds and introduce Choice Clustering—a novel framework integrating geometric approximation, weighted sampling, and combinatorial optimization. Our approach constructs a lightweight, weighted core-set that drastically reduces data size while preserving a $(1+varepsilon)$-approximation to the original clustering objective. Extensive experiments on real-world image datasets, synthetic benchmarks, and imbalanced real-world data demonstrate that our method achieves an average 12.7% improvement in clustering accuracy, runs 3.8× faster than state-of-the-art baselines, and exhibits strong robustness to class skew. The framework thus establishes a new paradigm for large-scale imbalanced clustering—uniquely combining rigorous theoretical guarantees with practical efficiency.
📝 Abstract
We suggest efficient and provable methods to compute an approximation for imbalanced point clustering, that is, fitting $k$-centers to a set of points in $mathbb{R}^d$, for any $d,kgeq 1$. To this end, we utilize emph{coresets}, which, in the context of the paper, are essentially weighted sets of points in $mathbb{R}^d$ that approximate the fitting loss for every model in a given set, up to a multiplicative factor of $1pmvarepsilon$. We provide [Section 3 and Section E in the appendix] experiments that show the empirical contribution of our suggested methods for real images (novel and reference), synthetic data, and real-world data. We also propose choice clustering, which by combining clustering algorithms yields better performance than each one separately.