On Tight Robust Coresets for $k$-Medians Clustering

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the robust $k$-median clustering problem with $m$ outliers, aiming to construct compact coresets and characterize their optimal sizes across diverse metric spaces. We propose a novel data decomposition framework that, for the first time, jointly applies chaining arguments to multiple structural components, unifying the treatment of metric spaces with VC-dimension or doubling dimension $d$. Our theoretical analysis yields tight coreset size bounds: $O(m) + ilde{O}(kdvarepsilon^{-2})$ for VC-dimension or doubling-dimension-$d$ spaces, and $O(mvarepsilon^{-1}) + ilde{O}(min{k^{4/3}varepsilon^{-2},, kvarepsilon^{-3}})$ for Euclidean space—substantially improving upon prior state-of-the-art. Furthermore, our framework naturally extends to the $(k,z)$-clustering problem with outliers.

Technology Category

Application Category

📝 Abstract
This paper considers coresets for the robust $k$-medians problem with $m$ outliers, and new constructions in various metric spaces are obtained. Specifically, for metric spaces with a bounded VC or doubling dimension $d$, the coreset size is $O(m) + ilde{O}(kdvarepsilon^{-2})$, which is optimal up to logarithmic factors. For Euclidean spaces, the coreset size is $O(mvarepsilon^{-1}) + ilde{O}(min{k^{4/3}varepsilon^{-2},kvarepsilon^{-3}})$, improving upon a recent result by Jiang and Lou (ICALP 2025). These results also extend to robust $(k,z)$-clustering, yielding, for VC and doubling dimension, a coreset size of $O(m) + ilde{O}(kdvarepsilon^{-2z})$ with the optimal linear dependence on $m$. This extended result improves upon the earlier work of Huang et al. (SODA 2025). The techniques introduce novel dataset decompositions, enabling chaining arguments to be applied jointly across multiple components.
Problem

Research questions and friction points this paper is trying to address.

Constructs coresets for robust k-medians with outliers
Optimizes coreset size in bounded VC or doubling dimensions
Extends results to robust (k,z)-clustering with improved bounds
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal coreset size for robust k-medians
Novel dataset decomposition techniques
Improved coreset bounds in Euclidean spaces
🔎 Similar Papers
No similar papers found.