Retrieval-Confused Generation is a Good Defender for Privacy Violation Attack of Large Language Models

📅 2025-06-24

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Large language models (LLMs) are vulnerable to privacy violation attacks (PVAs), while existing defenses suffer from exposure risks, high computational overhead, and insufficient robustness. To address these limitations, this paper proposes Retrieval-Confused Generation (RCG), a novel defense paradigm that jointly employs semantics-preserving review rewriting, database semantic perturbation, and a least-relevant retrieval strategy to inject controllable noise into model responses—thereby misleading attackers into extracting incorrect personal information while maintaining high stealth. Crucially, RCG avoids query rejection, preventing adaptive attack evolution. Extensive experiments across two real-world datasets and eight state-of-the-art LLMs demonstrate that RCG improves average defense success rate by 23.7% and reduces inference latency by 68%, significantly outperforming existing anonymization- and rejection-based approaches.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models (LLMs) have made a profound impact on our society and also raised new security concerns. Particularly, due to the remarkable inference ability of LLMs, the privacy violation attack (PVA), revealed by Staab et al., introduces serious personal privacy issues. Existing defense methods mainly leverage LLMs to anonymize the input query, which requires costly inference time and cannot gain satisfactory defense performance. Moreover, directly rejecting the PVA query seems like an effective defense method, while the defense method is exposed, promoting the evolution of PVA. In this paper, we propose a novel defense paradigm based on retrieval-confused generation (RCG) of LLMs, which can efficiently and covertly defend the PVA. We first design a paraphrasing prompt to induce the LLM to rewrite the "user comments" of the attack query to construct a disturbed database. Then, we propose the most irrelevant retrieval strategy to retrieve the desired user data from the disturbed database. Finally, the "data comments" are replaced with the retrieved user data to form a defended query, leading to responding to the adversary with some wrong personal attributes, i.e., the attack fails. Extensive experiments are conducted on two datasets and eight popular LLMs to comprehensively evaluate the feasibility and the superiority of the proposed defense method.

Problem

Research questions and friction points this paper is trying to address.

Defends privacy violation attacks on large language models

Uses retrieval-confused generation to mislead attackers

Improves efficiency and stealth over existing defense methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses retrieval-confused generation for defense

Paraphrases prompts to disturb attack queries

Retrieves irrelevant data to mislead adversaries

🔎 Similar Papers

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions