🤖 AI Summary
To address the vulnerability of large language models (LLMs) to personal identifiable information (PII) leakage under adversarial attacks, this paper proposes the first cognitive science–inspired *proactive privacy forgetting* mechanism, integrating a “precise forgetting + semantically consistent memory implantation” paradigm. Methodologically, it identifies PII-strongly associated memory units via gradient sensitivity analysis, employs a differentiable forgetting gate for selective erasure, and generates semantically aligned surrogate memories to preserve model functionality. Evaluated across multiple LLMs, the mechanism achieves 100% elimination of telephone number leakage risk, reduces physical address leakage risk by 9.8%–87.6%, and incurs negligible task performance degradation (<0.5%). These results significantly outperform existing approaches that trade off privacy preservation against utility maintenance.
📝 Abstract
With the rise of large language models (LLMs), increasing research has recognized their risk of leaking personally identifiable information (PII) under malicious attacks. Although efforts have been made to protect PII in LLMs, existing methods struggle to balance privacy protection with maintaining model utility. In this paper, inspired by studies of amnesia in cognitive science, we propose a novel approach, Proactive Privacy Amnesia (PPA), to safeguard PII in LLMs while preserving their utility. This mechanism works by actively identifying and forgetting key memories most closely associated with PII in sequences, followed by a memory implanting using suitable substitute memories to maintain the LLM's functionality. We conduct evaluations across multiple models to protect common PII, such as phone numbers and physical addresses, against prevalent PII-targeted attacks, demonstrating the superiority of our method compared with other existing defensive techniques. The results show that our PPA method completely eliminates the risk of phone number exposure by 100% and significantly reduces the risk of physical address exposure by 9.8% - 87.6%, all while maintaining comparable model utility performance.