Membership Inference Attacks Against Fine-tuned Diffusion Language Models

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This study addresses the lack of systematic analysis on privacy risks in fine-tuned diffusion language models. It reveals, for the first time, the high vulnerability of such models to membership inference attacks and proposes a novel Sub-ensemble Aggregation Membership Attack (SAMA). SAMA samples mask subsets of varying densities and combines sign statistics with inverse-weighted aggregation to effectively suppress heavy-tailed noise and enhance detection of sparse memorized signals. Extensive experiments across nine datasets demonstrate that SAMA substantially outperforms existing baselines, achieving an average AUC improvement of 30% and up to an 8-fold increase in attack success rate under low false positive rates.

Technology Category

Application Category

📝 Abstract

Diffusion Language Models (DLMs) represent a promising alternative to autoregressive language models, using bidirectional masked token prediction. Yet their susceptibility to privacy leakage via Membership Inference Attacks (MIA) remains critically underexplored. This paper presents the first systematic investigation of MIA vulnerabilities in DLMs. Unlike the autoregressive models'single fixed prediction pattern, DLMs'multiple maskable configurations exponentially increase attack opportunities. This ability to probe many independent masks dramatically improves detection chances. To exploit this, we introduce SAMA (Subset-Aggregated Membership Attack), which addresses the sparse signal challenge through robust aggregation. SAMA samples masked subsets across progressive densities and applies sign-based statistics that remain effective despite heavy-tailed noise. Through inverse-weighted aggregation prioritizing sparse masks'cleaner signals, SAMA transforms sparse memorization detection into a robust voting mechanism. Experiments on nine datasets show SAMA achieves 30% relative AUC improvement over the best baseline, with up to 8 times improvement at low false positive rates. These findings reveal significant, previously unknown vulnerabilities in DLMs, necessitating the development of tailored privacy defenses.

Problem

Research questions and friction points this paper is trying to address.

Membership Inference Attacks

Diffusion Language Models

Privacy Leakage

Model Vulnerability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Language Models

Membership Inference Attack

Masked Token Prediction

SAMA