Coreset for Robust Geometric Median: Eliminating Size Dependency on Outliers

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

This paper addresses the robust geometric median problem in Euclidean space, focusing on coreset construction that eliminates dependence on the number $ m $ of outliers—enabling more compact and adaptive data compression. The proposed method introduces a novel non-componentwise error analysis framework, the first to achieve a coreset size independent of $ m $; in one dimension, it attains the theoretical optimum and uncovers fundamental distinctions from the classical median problem. Technically, the approach integrates robust statistical analysis, geometry-aware sensitivity sampling, and $ varepsilon $-approximation theory, supporting natural extensions to diverse metric spaces. Experiments demonstrate that the algorithm achieves optimal trade-offs among accuracy, compression ratio, and runtime efficiency, while maintaining strong robustness even under adversarial outlier assumptions.

Technology Category

Application Category

📝 Abstract

We study the robust geometric median problem in Euclidean space $mathbb{R}^d$, with a focus on coreset construction.A coreset is a compact summary of a dataset $P$ of size $n$ that approximates the robust cost for all centers $c$ within a multiplicative error $varepsilon$. Given an outlier count $m$, we construct a coreset of size $ ilde{O}(varepsilon^{-2} cdot min{varepsilon^{-2}, d})$ when $n geq 4m$, eliminating the $O(m)$ dependency present in prior work [Huang et al., 2022&2023]. For the special case of $d = 1$, we achieve an optimal coreset size of $ ilde{Theta}(varepsilon^{-1/2} + frac{m}{n} varepsilon^{-1})$, revealing a clear separation from the vanilla case studied in [Huang et al., 2023; Afshani and Chris, 2024]. Our results further extend to robust $(k,z)$-clustering in various metric spaces, eliminating the $m$-dependence under mild data assumptions. The key technical contribution is a novel non-component-wise error analysis, enabling substantial reduction of outlier influence, unlike prior methods that retain them.Empirically, our algorithms consistently outperform existing baselines in terms of size-accuracy tradeoffs and runtime, even when data assumptions are violated across a wide range of datasets.

Problem

Research questions and friction points this paper is trying to address.

Eliminating coreset size dependency on outlier count

Constructing compact geometric median approximations with reduced error

Extending robust clustering results to various metric spaces

Innovation

Methods, ideas, or system contributions that make the work stand out.

Coreset eliminates size dependency on outliers

Novel non-component-wise error analysis reduces outlier influence

Extends to robust clustering in various metric spaces

🔎 Similar Papers

Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions