🤖 AI Summary
This work addresses a critical issue in differentially private (DP) training: the compounding bias in second-moment estimation of adaptive optimizers such as AdamW, exacerbated by gradient noise and time-domain filtering-based denoising. To mitigate this bias–calibration mismatch, the paper proposes FiBeR, a novel optimizer that introduces, for the first time, a filter-aware DP noise calibration mechanism. FiBeR performs denoising in an innovation space that decouples observation geometry from gain and enables closed-form computation of noise attenuation for general stable linear filters. This approach substantially alleviates second-moment estimation bias and achieves state-of-the-art performance for DP models across multiple vision and language benchmarks.
📝 Abstract
Differentially private (DP) training protects individual examples by adding noise to gradients, but the injected noise interacts nontrivially with adaptive optimizers. Recent DP methods temporally filter privatized gradients to reduce variance; however, filtering also changes the DP noise statistics seen by AdamW's second-moment accumulator. As a result, bias corrections derived for unfiltered DP noise, such as subtracting sigma_w squared, can become miscalibrated when filtering is present.
We propose FiBeR, a DP optimizer designed for temporally filtered privatized gradients. FiBeR (i) performs denoising in innovation space by filtering the residual stream and integrating it to form the filtered gradient estimate, (ii) decouples the two-point observation geometry from the innovation gain to enable independent tuning, and (iii) introduces a filter-aware second-moment calibration that subtracts the attenuated DP noise contribution A(omega) sigma_w squared, where A(omega) is derived in closed form for the innovation filter and can be computed for general stable linear filters.
Across vision and language benchmarks, FiBeR consistently demonstrates substantial improvements in the performance of DP optimizers, surpassing state-of-the-art results under equivalent privacy constraints on multiple tasks.