Membership Inference Attacks on Sequence Models

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Sequence models—including large language models and autoregressive image generators—are prone to memorizing training data, leading to sensitive information leakage; however, existing membership inference attacks largely assume token-level independence, limiting their accuracy in auditing such models. Method: We propose the first membership inference framework that explicitly models intra-sequence dependencies by jointly leveraging conditional likelihood ratios and generation trajectory confidence scores—thereby relaxing the restrictive independent-token assumption. Our method requires no additional computational overhead and operates solely on standard generative probability outputs. Contribution/Results: Evaluated across diverse sequence models, our approach significantly improves inference accuracy while maintaining low computational cost. It enables precise, efficient quantification of memorization-induced privacy leakage and provides a reproducible, reliable benchmarking tool for memorization assessment in large-scale sequence models.

Technology Category

Application Category

📝 Abstract

Sequence models, such as Large Language Models (LLMs) and autoregressive image generators, have a tendency to memorize and inadvertently leak sensitive information. While this tendency has critical legal implications, existing tools are insufficient to audit the resulting risks. We hypothesize that those tools' shortcomings are due to mismatched assumptions. Thus, we argue that effectively measuring privacy leakage in sequence models requires leveraging the correlations inherent in sequential generation. To illustrate this, we adapt a state-of-the-art membership inference attack to explicitly model within-sequence correlations, thereby demonstrating how a strong existing attack can be naturally extended to suit the structure of sequence models. Through a case study, we show that our adaptations consistently improve the effectiveness of memorization audits without introducing additional computational costs. Our work hence serves as an important stepping stone toward reliable memorization audits for large sequence models.

Problem

Research questions and friction points this paper is trying to address.

Detecting privacy leaks in sequence models

Improving membership inference attack effectiveness

Auditing memorization risks without extra costs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging sequential generation correlations

Adapting membership inference attack

Improving memorization audits effectively

🔎 Similar Papers

Context-Aware Membership Inference Attacks against Pre-trained Large Language Models