🤖 AI Summary
This work addresses the privacy risks associated with fine-tuning pretrained models on small-scale sensitive datasets, which are prone to memorization and subsequent leakage. The authors propose a novel differentially private fine-tuning algorithm based on the exponential mechanism, introducing—for the first time—a local quadratic approximation within this framework to construct a utility function that effectively combines knowledge from the pretrained model with the new data. The method enables exact sampling from a closed-form multivariate normal distribution and incorporates a random projection strategy to enhance scalability in high-dimensional model spaces. Experiments on MNIST and the MIMIC clinical dataset demonstrate that the proposed approach significantly outperforms existing differentially private fine-tuning methods, achieving superior privacy-utility trade-offs with strong theoretical guarantees and practical efficacy.
📝 Abstract
Fine-tuning adapts a pretrained machine learning model to a small, sensitive dataset, but this process risks memorizing individual new data points, making the model vulnerable to adversaries who seek to extract sensitive information. In this work, we develop a randomized algorithm based on the exponential mechanism for fine-tuning while ensuring differential privacy. Our key idea is to construct a simple utility function that combines a local quadratic approximation of the pretrained model with information from the new dataset. The resulting exponential mechanism admits exact sampling from a multivariate normal distribution in closed form. We establish theoretical privacy guarantees, sensitivity bounds, and accuracy estimations for our method. We further introduce a random-projection strategy that makes the approach scalable to high-dimensional models. Numerical experiments on the MNIST benchmark and the MIMIC clinical dataset demonstrate competitive performance against existing differentially private fine-tuning techniques.