Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility

📅 2025-02-24

📈 Citations: 1

✨ Influential: 0

career value

234K/year

🤖 AI Summary

To address the vulnerability of large language models (LLMs) to personal identifiable information (PII) leakage under adversarial attacks, this paper proposes the first cognitive science–inspired *proactive privacy forgetting* mechanism, integrating a “precise forgetting + semantically consistent memory implantation” paradigm. Methodologically, it identifies PII-strongly associated memory units via gradient sensitivity analysis, employs a differentiable forgetting gate for selective erasure, and generates semantically aligned surrogate memories to preserve model functionality. Evaluated across multiple LLMs, the mechanism achieves 100% elimination of telephone number leakage risk, reduces physical address leakage risk by 9.8%–87.6%, and incurs negligible task performance degradation (<0.5%). These results significantly outperform existing approaches that trade off privacy preservation against utility maintenance.

Technology Category

Application Category

📝 Abstract

With the rise of large language models (LLMs), increasing research has recognized their risk of leaking personally identifiable information (PII) under malicious attacks. Although efforts have been made to protect PII in LLMs, existing methods struggle to balance privacy protection with maintaining model utility. In this paper, inspired by studies of amnesia in cognitive science, we propose a novel approach, Proactive Privacy Amnesia (PPA), to safeguard PII in LLMs while preserving their utility. This mechanism works by actively identifying and forgetting key memories most closely associated with PII in sequences, followed by a memory implanting using suitable substitute memories to maintain the LLM's functionality. We conduct evaluations across multiple models to protect common PII, such as phone numbers and physical addresses, against prevalent PII-targeted attacks, demonstrating the superiority of our method compared with other existing defensive techniques. The results show that our PPA method completely eliminates the risk of phone number exposure by 100% and significantly reduces the risk of physical address exposure by 9.8% - 87.6%, all while maintaining comparable model utility performance.

Problem

Research questions and friction points this paper is trying to address.

Protecting PII in large language models

Balancing privacy with model utility

Mitigating PII leakage under malicious attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proactive Privacy Amnesia

Memory Implanting

PII Risk Reduction

🔎 Similar Papers

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions