🤖 AI Summary
Existing member inference attacks (MIAs) lack fine-grained assessment of privacy risks posed by sensitive entities—such as personally identifiable information (PII) or credit card numbers—in large language models (LLMs). Method: This work introduces the first entity-level MIA task and proposes EL-MIA, a dedicated evaluation framework comprising a novel benchmark dataset and a unified pipeline integrating output probabilities, entropy, and gradient-based signals. It incorporates established MIA techniques alongside two newly proposed lightweight strategies for comparative analysis. Contribution/Results: Experiments demonstrate that conventional MIAs fail at the entity level, whereas simple, low-overhead strategies effectively expose entity-level membership leakage. Model scale, training epochs, and other architectural factors significantly influence entity-level privacy risk. The study reveals critical limitations in current threat models for LLMs and establishes a new paradigm—grounded in empirical evidence—for fine-grained privacy assessment in generative models.
📝 Abstract
Membership inference attacks (MIA) aim to infer whether a particular data point is part of the training dataset of a model. In this paper, we propose a new task in the context of LLM privacy: entity-level discovery of membership risk focused on sensitive information (PII, credit card numbers, etc). Existing methods for MIA can detect the presence of entire prompts or documents in the LLM training data, but they fail to capture risks at a finer granularity. We propose the ``EL-MIA'' framework for auditing entity-level membership risks in LLMs. We construct a benchmark dataset for the evaluation of MIA methods on this task. Using this benchmark, we conduct a systematic comparison of existing MIA techniques as well as two newly proposed methods. We provide a comprehensive analysis of the results, trying to explain the relation of the entity level MIA susceptability with the model scale, training epochs, and other surface level factors. Our findings reveal that existing MIA methods are limited when it comes to entity-level membership inference of the sensitive attributes, while this susceptibility can be outlined with relatively straightforward methods, highlighting the need for stronger adversaries to stress test the provided threat model.