Towards Benchmarking Privacy Vulnerabilities in Selective Forgetting with Large Language Models

📅 2025-12-19

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

Existing privacy evaluations for selective forgetting (machine unlearning) suffer from inconsistency and over-optimistic conclusions. Method: This paper introduces the first comprehensive privacy-vulnerability benchmark tailored for large language models (LLMs), proposing a standardized evaluation framework to systematically assess mainstream unlearning methods (e.g., gradient ascent, Fisher-based, and retraining), novel privacy attacks (e.g., reconstruction and membership inference), diverse LLM architectures across scales, and heterogeneous sensitive data. Contribution/Results: We uncover, for the first time, a nonlinear privacy–utility trade-off governed by forgetting strength, model capacity, and data sensitivity—challenging fragmented assessment paradigms. We open-source a fully reproducible toolkit; empirical results show that current unlearning methods incur an average privacy leakage rate exceeding 68% in sensitive scenarios.

Technology Category

Application Category

📝 Abstract

The rapid advancements in artificial intelligence (AI) have primarily focused on the process of learning from data to acquire knowledgeable learning systems. As these systems are increasingly deployed in critical areas, ensuring their privacy and alignment with human values is paramount. Recently, selective forgetting (also known as machine unlearning) has shown promise for privacy and data removal tasks, and has emerged as a transformative paradigm shift in the field of AI. It refers to the ability of a model to selectively erase the influence of previously seen data, which is especially important for compliance with modern data protection regulations and for aligning models with human values. Despite its promise, selective forgetting raises significant privacy concerns, especially when the data involved come from sensitive domains. While new unlearning-induced privacy attacks are continuously proposed, each is shown to outperform its predecessors using different experimental settings, which can lead to overly optimistic and potentially unfair assessments that may disproportionately favor one particular attack over the others. In this work, we present the first comprehensive benchmark for evaluating privacy vulnerabilities in selective forgetting. We extensively investigate privacy vulnerabilities of machine unlearning techniques and benchmark privacy leakage across a wide range of victim data, state-of-the-art unlearning privacy attacks, unlearning methods, and model architectures. We systematically evaluate and identify critical factors related to unlearning-induced privacy leakage. With our novel insights, we aim to provide a standardized tool for practitioners seeking to deploy customized unlearning applications with faithful privacy assessments.

Problem

Research questions and friction points this paper is trying to address.

Benchmarking privacy vulnerabilities in selective forgetting

Evaluating privacy leakage across diverse unlearning attacks and methods

Identifying critical factors in unlearning-induced privacy risks

Innovation

Methods, ideas, or system contributions that make the work stand out.

First comprehensive benchmark for selective forgetting privacy vulnerabilities

Systematically evaluate unlearning-induced privacy leakage across multiple factors

Provide standardized tool for privacy assessments in unlearning applications

🔎 Similar Papers

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions