On the Privacy of LLMs: An Ablation Study

📅 2026-05-04

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This study addresses the multifaceted privacy threats faced by large language models in interactive and retrieval-augmented settings, where a systematic understanding of how system-level factors influence attack efficacy remains lacking. The authors propose a unified threat model to reproduce and comparatively evaluate representative privacy attacks—including membership inference, attribute inference, data extraction, and backdoor attacks—within a consistent framework. Through structured ablation studies, they assess the impact of model architecture, scale, training data, and retrieval configuration on attack performance. Their findings reveal that masked membership inference and backdoor attacks achieve notably high effectiveness, while attribute inference and data extraction, despite lower accuracy, still pose substantive privacy risks. The results underscore the highly context-dependent nature of privacy vulnerabilities in large language models.

📝 Abstract

Large language models (LLMs) are increasingly deployed in interactive and retrieval-augmented settings, raising significant privacy concerns. While attacks such as Membership Inference (MIA), Attribute Inference (AIA), Data Extraction (DEA), and Backdoor Attacks (BA) have been studied, they are typically analyzed in isolation, leaving a gap in understanding their behavior under common system factors. In this paper, we introduce a unified threat model and notation, reproduce a representative set of privacy attacks, and conduct a structured ablation study to evaluate the impact of key factors such as model architecture, scale, dataset characteristics, and retrieval configuration. Our analysis reveals clear differences across attack types. Membership inference attacks, particularly mask-based variants, exhibit strong and reliable signals, while backdoor attacks achieve consistently high success rates due to their trigger-based nature. In contrast, attribute inference and data extraction attacks remain more challenging, resulting in lower accuracy, yet they pose significant risks as they target sensitive personal information. Overall, these results highlight that privacy risks in LLM systems are highly context-dependent and driven by design choices, emphasizing the need for holistic evaluation and informed deployment practices.

Problem

Research questions and friction points this paper is trying to address.

Privacy

Large Language Models

Membership Inference

Attribute Inference

Data Extraction

Innovation

Methods, ideas, or system contributions that make the work stand out.

privacy attacks

ablation study

large language models