Large Language Models are Advanced Anonymizers

📅 2024-02-21
🏛️ arXiv.org
📈 Citations: 7
Influential: 3
📄 PDF
🤖 AI Summary
Existing anonymization methods fail to withstand strong inference attacks by large language models (LLMs) on online text, leading to severe privacy breaches. To address this, we propose the first LLM-aware anonymization paradigm specifically designed to counteract LLM-driven inference. Our method introduces an LLM-powered adversarial anonymization framework: it explicitly models the attacker’s capabilities using LLMs and jointly optimizes controllable text generation with semantic preservation to achieve co-enhancement of privacy protection and textual utility. Furthermore, we design the first benchmark tailored to evaluate anonymization robustness against LLM-based attackers. Extensive experiments on both real-world and synthetic datasets demonstrate that our approach significantly outperforms state-of-the-art industrial anonymization tools in two critical dimensions—resistance to LLM inference (privacy) and performance on downstream NLP tasks (utility)—thereby establishing the first robust anonymization solution effective against LLM-level adversaries.

Technology Category

Application Category

📝 Abstract
Recent work in privacy research on large language models has shown that they achieve near human-level performance at inferring personal data from real-world online texts. With consistently increasing model capabilities, existing text anonymization methods are currently lacking behind regulatory requirements and adversarial threats. This raises the question of how individuals can effectively protect their personal data in sharing online texts. In this work, we take two steps to answer this question: We first present a new setting for evaluating anonymizations in the face of adversarial LLMs inferences, allowing for a natural measurement of anonymization performance while remedying some of the shortcomings of previous metrics. We then present our LLM-based adversarial anonymization framework leveraging the strong inferential capabilities of LLMs to inform our anonymization procedure. In our experimental evaluation, we show on real-world and synthetic online texts how adversarial anonymization outperforms current industry-grade anonymizers both in terms of the resulting utility and privacy.
Problem

Research questions and friction points this paper is trying to address.

Privacy Protection
Language Models
Anonymization Methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Anonymization
Advanced Language Models
Privacy Enhancement
🔎 Similar Papers
No similar papers found.