🤖 AI Summary
This work addresses robust optimization in distributed learning under Byzantine failures, data heterogeneity, differential privacy constraints, and state-dependent heavy-tailed gradient noise. Method: We reveal that gradient clipping in SGD implicitly performs geometric median estimation, establishing for the first time its theoretical equivalence to explicit median-based gradient estimation. We propose an iterative geometric median estimation framework that unifies the analysis of gradient clipping, DP-SGD, and related methods. Crucially, convergence is proven under heavy-tailed noise without assuming bounded or light-tailed gradients. Contribution/Results: Our work introduces the first unified theoretical framework for median-based estimation applicable across multiple robust learning scenarios. It provides strong, assumption-light convergence guarantees—requiring neither gradient boundedness nor sub-Gaussian tail conditions—and yields practical, implementable algorithms. The framework bridges theoretical rigor with broad applicability in modern robust and private distributed optimization.
📝 Abstract
There are several applications of stochastic optimization where one can benefit from a robust estimate of the gradient. For example, domains such as distributed learning with corrupted nodes, the presence of large outliers in the training data, learning under privacy constraints, or even heavy-tailed noise due to the dynamics of the algorithm itself. Here we study SGD with robust gradient estimators based on estimating the median. We first consider computing the median gradient across samples, and show that the resulting method can converge even under heavy-tailed, state-dependent noise. We then derive iterative methods based on the stochastic proximal point method for computing the geometric median and generalizations thereof. Finally we propose an algorithm estimating the median gradient across iterations, and find that several well known methods - in particular different forms of clipping - are particular cases of this framework.