🤖 AI Summary
This paper addresses the problem of efficiently computing the geometric median of a dataset under differential privacy. While existing methods achieve information-theoretically optimal sample complexity—$n gtrsim sqrt{d}/(alphavarepsilon)$ for $(varepsilon,delta)$-privacy and $alpha$-multiplicative approximation—their computational cost remains prohibitively high. We propose the first nearly linear-time algorithm with complexity $ ilde{O}(nd + d/alpha^2)$. Our method integrates subsampling, a FriendlyCore-inspired geometric aggregation scheme, a customized sensitivity analysis for differentially private stochastic gradient descent (DP-SGD), and non-private first-order optimization techniques. Crucially, the resulting error depends on the *effective* data radius rather than the worst-case diameter, improving practical accuracy. We prove that, given the optimal sample size, our algorithm achieves $alpha$-multiplicative approximation with high probability. To the best of our knowledge, this is the fastest known differentially private algorithm for geometric median computation, achieving state-of-the-art time complexity while preserving statistical optimality.
📝 Abstract
Estimating the geometric median of a dataset is a robust counterpart to mean estimation, and is a fundamental problem in computational geometry. Recently, [HSU24] gave an $(varepsilon, delta)$-differentially private algorithm obtaining an $alpha$-multiplicative approximation to the geometric median objective, $frac 1 n sum_{i in [n]} |cdot - mathbf{x}_i|$, given a dataset $mathcal{D} := {mathbf{x}_i}_{i in [n]} subset mathbb{R}^d$. Their algorithm requires $n gtrsim sqrt d cdot frac 1 {alphavarepsilon}$ samples, which they prove is information-theoretically optimal. This result is surprising because its error scales with the emph{effective radius} of $mathcal{D}$ (i.e., of a ball capturing most points), rather than the worst-case radius. We give an improved algorithm that obtains the same approximation quality, also using $n gtrsim sqrt d cdot frac 1 {alphaepsilon}$ samples, but in time $widetilde{O}(nd + frac d {alpha^2})$. Our runtime is nearly-linear, plus the cost of the cheapest non-private first-order method due to [CLM+16]. To achieve our results, we use subsampling and geometric aggregation tools inspired by FriendlyCore [TCK+22] to speed up the"warm start"component of the [HSU24] algorithm, combined with a careful custom analysis of DP-SGD's sensitivity for the geometric median objective.