🤖 AI Summary
This paper investigates the statistical advantages of accelerated gradient methods under differential privacy and heavy-tailed noise. Addressing three settings—deterministic, model-agnostic stochastic, and parametric regression—the work proposes: (1) the first Frank–Wolfe acceleration scheme integrating customized learning rates with gradient lower-bound analysis; (2) a novel gradient estimator based on geometric median–mean fusion, substantially improving dimension dependence; and (3) simultaneous attainment of strong privacy (via Gaussian mechanism with advanced composition theorem) and robustness within the noisy-gradient framework. Theoretically, the proposed methods achieve optimal ℓ₂-constrained convergence rates, reduce iteration complexity, and deliver tight statistical guarantees for both empirical and population risk minimization. Notably, this work establishes the first privacy–robustness joint estimation bound that breaks existing trade-off barriers.
📝 Abstract
We study the advantages of accelerated gradient methods, specifically based on the Frank-Wolfe method and projected gradient descent, for privacy and heavy-tailed robustness. Our approaches are as follows: For the Frank-Wolfe method, our technique is based on a tailored learning rate and a uniform lower bound on the gradient of the $ell_2$-norm over the constraint set. For accelerating projected gradient descent, we use the popular variant based on Nesterov's momentum, and we optimize our objective over $mathbb{R}^p$. These accelerations reduce iteration complexity, translating into stronger statistical guarantees for empirical and population risk minimization. Our analysis covers three settings: non-random data, random model-free data, and parametric models (linear regression and generalized linear models). Methodologically, we approach both privacy and robustness based on noisy gradients. We ensure differential privacy via the Gaussian mechanism and advanced composition, and we achieve heavy-tailed robustness using a geometric median-of-means estimator, which also sharpens the dependency on the dimension of the covariates. Finally, we compare our rates to existing bounds and identify scenarios where our methods attain optimal convergence.