Extracting Training Data from Diffusion Language Models via Infilling

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the underexplored memory risks in diffusion language models, which prior evaluations have largely confined to prefix-conditioned extraction, overlooking the broader data leakage potential inherent in their bidirectional denoising mechanism. The authors propose a novel “fill-in-the-blank” extraction protocol that introduces the concept of mask geometry, enabling systematic assessment of train-data extractability through arbitrary binary masks that simulate realistic attack scenarios. Experiments on LLaDA-8B and Dream-7B across five masking patterns, three training pipelines, and diverse corpora reveal that edge-conditioned masks recover verbatim sequences at roughly three times the rate of prefix-conditioned ones. Even after supervised fine-tuning, models retain substantial memorized content, particularly posing significant privacy risks in reconstructing redacted personal information. These findings demonstrate that bidirectional context dramatically enhances extraction success, presenting a serious challenge to privacy preservation.

📝 Abstract

Memorization in large language models has been studied almost exclusively through prefix-conditioned extraction, a natural choice for autoregressive models. However, diffusion language models (DLMs) can denoise masked tokens at arbitrary positions. Thus, prefix-only probing reveals only one facet of memorization in DLMs and significantly underestimates the risk of training-data extraction. In order to realistically model extractability of training data in DLMs, we introduce \emph{infilling extraction}, a data-extraction protocol parameterized by an arbitrary binary mask that subsumes prefix-only probing and accounts for the bidirectional inductive bias of DLMs. Instantiating it on LLaDA-8B and Dream-7B across five extraction modes, three training pipelines, and three corpora covering verbatim and partial leakage, we find that mask geometry governs extractability: edge-conditioned masks \emph{extract up to three times more} verbatim sequences than prefix-conditioned ones, and bidirectional access opens channels inaccessible in autoregressive models. In particular, we show that a realistic adversary with access to training data where personally identifiable information has been redacted, can even achieve higher recall on extracting redacted email addresses from DLMs than from scale-matched autoregressive models. Tunable parameters for decoding measurably affect extraction performance, while a follow-up supervised finetuning stage does not eliminate the prior memorization.

Problem

Research questions and friction points this paper is trying to address.

diffusion language models

training data extraction

memorization

infilling

data privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

infilling extraction

diffusion language models

training data memorization