On the Privacy of LLMs: An Ablation Study

📅 2026-05-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

212K/year
🤖 AI Summary
This study addresses the multifaceted privacy threats faced by large language models in interactive and retrieval-augmented settings, where a systematic understanding of how system-level factors influence attack efficacy remains lacking. The authors propose a unified threat model to reproduce and comparatively evaluate representative privacy attacks—including membership inference, attribute inference, data extraction, and backdoor attacks—within a consistent framework. Through structured ablation studies, they assess the impact of model architecture, scale, training data, and retrieval configuration on attack performance. Their findings reveal that masked membership inference and backdoor attacks achieve notably high effectiveness, while attribute inference and data extraction, despite lower accuracy, still pose substantive privacy risks. The results underscore the highly context-dependent nature of privacy vulnerabilities in large language models.
📝 Abstract
Large language models (LLMs) are increasingly deployed in interactive and retrieval-augmented settings, raising significant privacy concerns. While attacks such as Membership Inference (MIA), Attribute Inference (AIA), Data Extraction (DEA), and Backdoor Attacks (BA) have been studied, they are typically analyzed in isolation, leaving a gap in understanding their behavior under common system factors. In this paper, we introduce a unified threat model and notation, reproduce a representative set of privacy attacks, and conduct a structured ablation study to evaluate the impact of key factors such as model architecture, scale, dataset characteristics, and retrieval configuration. Our analysis reveals clear differences across attack types. Membership inference attacks, particularly mask-based variants, exhibit strong and reliable signals, while backdoor attacks achieve consistently high success rates due to their trigger-based nature. In contrast, attribute inference and data extraction attacks remain more challenging, resulting in lower accuracy, yet they pose significant risks as they target sensitive personal information. Overall, these results highlight that privacy risks in LLM systems are highly context-dependent and driven by design choices, emphasizing the need for holistic evaluation and informed deployment practices.
Problem

Research questions and friction points this paper is trying to address.

Privacy
Large Language Models
Membership Inference
Attribute Inference
Data Extraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

privacy attacks
ablation study
large language models
unified threat model
retrieval-augmented generation