🤖 AI Summary
This work systematically investigates, for the first time, how post-training quantization (PTQ) affects model privacy leakage—specifically, the relationship between bit-width reduction and vulnerability to membership inference attacks (MIAs). Leveraging mainstream PTQ methods—including AdaRound, BRECQ, and OBC—we evaluate privacy–utility trade-offs across 4-bit to 1.58-bit quantizations on CIFAR-10/100 and TinyImageNet. Results demonstrate that low-bit PTQ substantially mitigates MIA success rates, with up to a 90% reduction (nearly one order of magnitude) at 1.58 bits. To further balance privacy preservation and model utility, we propose a layer-wise strategy that applies high-bit quantization exclusively to the final layer. This approach enables fine-grained, controllable trade-off management. Our study fills a critical theoretical gap in privacy analysis of quantized models and establishes a new paradigm for practical model compression that jointly optimizes efficiency, accuracy, and privacy.
📝 Abstract
Deep neural networks are widely deployed with quantization techniques to reduce memory and computational costs by lowering the numerical precision of their parameters. While quantization alters model parameters and their outputs, existing privacy analyses primarily focus on full-precision models, leaving a gap in understanding how bit-width reduction can affect privacy leakage. We present the first systematic study of the privacy-utility relationship in post-training quantization (PTQ), a versatile family of methods that can be applied to pretrained models without further training. Using membership inference attacks as our evaluation framework, we analyze three popular PTQ algorithms-AdaRound, BRECQ, and OBC-across multiple precision levels (4-bit, 2-bit, and 1.58-bit) on CIFAR-10, CIFAR-100, and TinyImageNet datasets. Our findings consistently show that low-precision PTQs can reduce privacy leakage. In particular, lower-precision models demonstrate up to an order of magnitude reduction in membership inference vulnerability compared to their full-precision counterparts, albeit at the cost of decreased utility. Additional ablation studies on the 1.58-bit quantization level show that quantizing only the last layer at higher precision enables fine-grained control over the privacy-utility trade-off. These results offer actionable insights for practitioners to balance efficiency, utility, and privacy protection in real-world deployments.