PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the challenge of preserving strong privacy under zero mutual information (I=0) constraints while maintaining model utility—a regime where existing methods struggle. The authors propose PACZero, the first approach enabling efficient fine-tuning of large language models under I=0. PACZero achieves this by employing sign-quantized zeroth-order gradients and releasing zero conditional mutual information when gradient directions align within a candidate subset, combined with both LoRA and full-parameter update strategies. Within the Probably Approximately Correct (PAC) privacy framework, two variants—PACZero-MI and PACZero-ZPL—are developed. Evaluated on OPT-1.3B and OPT-6.7B, PACZero significantly outperforms prior art: it attains 88.99% accuracy on SST-2 (only 2.1% below the non-private baseline) and sets a new state-of-the-art F1 score on SQuAD under I=0, overcoming the severe utility degradation typically observed in differential privacy when ε<1.

📝 Abstract

We introduce PACZero, a family of PAC-private zeroth-order mechanisms for fine-tuning large language models that delivers usable utility at $I(S^*; Y_{1:T})=0$. This privacy regime bounds the membership-inference attack (MIA) posterior success rate at the prior, an MIA-resistance level the DP framework matches only at $\varepsilon=0$ and infinite noise. All DP-ZO comparisons below are matched at the MIA posterior level. The key insight is that PAC Privacy charges mutual information only when the release depends on which candidate subset is the secret. Sign-quantizing subset-aggregated zeroth-order gradients creates frequent unanimity, steps at which every candidate subset agrees on the update direction; at these steps the released sign costs zero conditional mutual information. We propose two variants that span the privacy-utility trade-off: PACZero-MI (budgeted MI via exact calibration on the binary release) and PACZero-ZPL ($I=0$ via a uniform coin flip on disagreement steps). We evaluate on SST-2 and SQuAD with OPT-1.3B and OPT-6.7B in both LoRA and full-parameter tracks. On SST-2 OPT-1.3B full fine-tuning at $I=0$, PACZero-ZPL reaches ${88.99\pm0.91}$, within $2.1$pp of the non-private MeZO baseline ($91.1$ FT). No prior method produces usable utility in the high-privacy regime $\varepsilon<1$, and PACZero-ZPL obtains competitive SST-2 accuracy and nontrivial SQuAD F1 across OPT-1.3B and OPT-6.7B at $I=0$.

Problem

Research questions and friction points this paper is trying to address.

privacy-preserving fine-tuning

membership inference attack

zeroth-order optimization

mutual information

language model

Innovation

Methods, ideas, or system contributions that make the work stand out.

PAC Privacy

zeroth-order optimization

sign quantization