🤖 AI Summary
This work addresses the geometric mismatch between the isotropic noise injected by DP-SGD and the anisotropic loss landscape of deep neural networks in differentially private optimization. Existing preconditioning methods either consume privacy budget by using private data or suffer from distributional shift when relying on public data. To overcome this, the authors propose a KFAC-based preconditioner that requires no real data: it employs structured synthetic noise to probe the network, decoupling the Fisher information matrix into an architecture-sensitive component—recovered via synthetic noise—and an input-dependent component—approximated using modality-specific spectral statistics. This approach is the first to estimate curvature information without accessing either private or public data. Under strong privacy constraints (ε ≤ 3), DP-KFC consistently outperforms DP-SGD and adaptive baselines across multimodal tasks, matching the performance of private-data-based methods while avoiding the up to 4.8% accuracy drop caused by reliance on public data.
📝 Abstract
Differentially private optimization suffers from a fundamental geometric mismatch: deep networks have highly anisotropic loss landscapes, yet DP-SGD injects isotropic noise. Second-order preconditioning can resolve this, but estimating curvature typically requires private data (consuming privacy budget) or public data (introducing distribution shift). We show that the Fisher Information Matrix decouples into architectural sensitivity, recoverable via synthetic noise, and input correlations, approximable from modality-specific frequency statistics. We propose DP-KFC, which constructs KFAC preconditioners by probing networks with structured synthetic noise, requiring neither private nor public data. Empirically, DP-KFC consistently outperforms DP-SGD and adaptive baselines across diverse modalities in strong privacy regimes ($\varepsilon \leq 3$). DP-KFC matches private-data preconditioners while public-data variants degrade by up to $4.8\%$, showing that curvature can be estimated without consuming privacy budget or introducing distribution shift. This enables privacy-preserving learning in specialized domains (e.g., medical applications) where regulatory constraints make data scarce.