🤖 AI Summary
This work addresses a critical vulnerability in existing machine unlearning methods: under low-bit quantized deployment, discrepancies between training and deployment precision can lead to the recovery of supposedly forgotten data, thereby compromising privacy. The study is the first to identify this phenomenon as Quantization Recovery Attacks (QRA) and formulates the fundamental trilemma among Forgetting, Robustness, and Quantization (FA-RA-Q). To resolve this challenge, the authors propose DURABLEUN-SAF, which integrates NF4+LoRA-based INT4 quantization, straight-through estimator for gradient approximation, and a sharpness-aware unlearning objective. This approach achieves stable unlearning across BF16, INT8, and INT4 precisions, demonstrating an unlearning metric of 0.043 ± 0.002 under INT4 and 100% certification success. Notably, it is the only method to date that passes durability certification with a stability score of 0.047.
📝 Abstract
Machine unlearning aims to remove specified training data to satisfy privacy regulations such as GDPR. However, existing evaluations assume identical precision at unlearning and deployment, overlooking that production LLMs are deployed at low-bit precision. We show that INT4 quantization systematically restores forgotten content even when models pass compliance audits at bfloat16 (BF16), we term this the quantization recovery attack (QRA). We conduct the first systematic study of unlearning robustness under adapter-space INT4 quantization in the NF4+LoRA regime, evaluating seven methods on LLaMA-3-8B-Instruct across TOFU, MUSE-News, and WikiBio-WPU. INT8 is benign; INT4 induces recovery of up to 22x, worsening with dataset difficulty. We identify the FA-RA-Q-INT4 trilemma: no method simultaneously achieves strong forgetting, high utility, and quantization robustness. A dense Pareto sweep reveals a sharp phase transition once robustness is achieved, retaining accuracy collapses regardless of further tuning. To address this, we propose DURABLEUN-SAF (Sharpness-Aware Forgetting), a quantization-aware objective using Straight-Through Estimator gradients through INT4 rounding. DURABLEUN-SAF is the only method to achieve a stable empirical (0.047, {BF16, INT8, INT4})- durability certificate: Q-INT4= 0.043 +- 0.002, cert rate= 3/3, versus SalUn's cert rate= 1/3 at its own published hyperparameters. We call for Q-INT4 to be adopted as a standard evaluation metric alongside FA and RA.