Textual Unlearning Gives a False Sense of Unlearning

📅 2024-06-19
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing text unlearning mechanisms in language models suffer from a fundamental flaw: rather than reliably erasing sensitive memories, they inadvertently amplify membership inference and data reconstruction risks—inducing a “false unlearning” effect. Method: We propose the U-LiRA+ auditing framework and the TULA leakage attack (with both black-box and white-box variants), the first systematic approach to expose how text unlearning paradoxically increases training data exposure. Using multi-dimensional auditing—including likelihood ratio testing, membership inference, and reverse reconstruction—we empirically evaluate mainstream unlearning methods. Contribution/Results: Our experiments demonstrate that these methods significantly elevate attack success rates. The work fundamentally challenges the prevailing “unlearning implies security” paradigm, establishing critical security boundaries and a rigorous evaluation benchmark for trustworthy machine unlearning.

Technology Category

Application Category

📝 Abstract
Language Models (LMs) are prone to ''memorizing'' training data, including substantial sensitive user information. To mitigate privacy risks and safeguard the right to be forgotten, machine unlearning has emerged as a promising approach for enabling LMs to efficiently ''forget'' specific texts. However, despite the good intentions, is textual unlearning really as effective and reliable as expected? To address the concern, we first propose Unlearning Likelihood Ratio Attack+ (U-LiRA+), a rigorous textual unlearning auditing method, and find that unlearned texts can still be detected with very high confidence after unlearning. Further, we conduct an in-depth investigation on the privacy risks of textual unlearning mechanisms in deployment and present the Textual Unlearning Leakage Attack (TULA), along with its variants in both black- and white-box scenarios. We show that textual unlearning mechanisms could instead reveal more about the unlearned texts, exposing them to significant membership inference and data reconstruction risks. Our findings highlight that existing textual unlearning actually gives a false sense of unlearning, underscoring the need for more robust and secure unlearning mechanisms.
Problem

Research questions and friction points this paper is trying to address.

Textual unlearning effectiveness questioned
Privacy risks in unlearning mechanisms exposed
Need for secure unlearning methods emphasized
Innovation

Methods, ideas, or system contributions that make the work stand out.

U-LiRA+ auditing method
Textual Unlearning Leakage Attack
membership inference risks
🔎 Similar Papers
No similar papers found.
Jiacheng Du
Jiacheng Du
Zhejiang University
Trustworthy AI
Z
Zhibo Wang
The State Key Laboratory of Blockchain and Data Security, Zhejiang University, China; School of Cyber Science and Technology, Zhejiang University, China
Kui Ren
Kui Ren
Professor and Dean of Computer Science, Zhejiang University, ACM/IEEE Fellow
Data Security & PrivacyAI SecurityIoT & Vehicular Security