🤖 AI Summary
This paper studies the robust $k$-median clustering problem with $m$ outliers, aiming to construct compact coresets and characterize their optimal sizes across diverse metric spaces. We propose a novel data decomposition framework that, for the first time, jointly applies chaining arguments to multiple structural components, unifying the treatment of metric spaces with VC-dimension or doubling dimension $d$. Our theoretical analysis yields tight coreset size bounds: $O(m) + ilde{O}(kdvarepsilon^{-2})$ for VC-dimension or doubling-dimension-$d$ spaces, and $O(mvarepsilon^{-1}) + ilde{O}(min{k^{4/3}varepsilon^{-2},, kvarepsilon^{-3}})$ for Euclidean space—substantially improving upon prior state-of-the-art. Furthermore, our framework naturally extends to the $(k,z)$-clustering problem with outliers.
📝 Abstract
This paper considers coresets for the robust $k$-medians problem with $m$ outliers, and new constructions in various metric spaces are obtained. Specifically, for metric spaces with a bounded VC or doubling dimension $d$, the coreset size is $O(m) + ilde{O}(kdvarepsilon^{-2})$, which is optimal up to logarithmic factors. For Euclidean spaces, the coreset size is $O(mvarepsilon^{-1}) + ilde{O}(min{k^{4/3}varepsilon^{-2},kvarepsilon^{-3}})$, improving upon a recent result by Jiang and Lou (ICALP 2025). These results also extend to robust $(k,z)$-clustering, yielding, for VC and doubling dimension, a coreset size of $O(m) + ilde{O}(kdvarepsilon^{-2z})$ with the optimal linear dependence on $m$. This extended result improves upon the earlier work of Huang et al. (SODA 2025). The techniques introduce novel dataset decompositions, enabling chaining arguments to be applied jointly across multiple components.