Black-box Membership Inference Attacks on the Pre-training Data of Image-generation Models

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing black-box membership inference attacks struggle to effectively identify specific samples from the pretraining data of diffusion models, particularly exhibiting limited discriminative power for low-exposure instances. This work proposes SD-MIA, the first black-box membership inference framework tailored for closed-source platforms, which requires no access to internal model features. Instead, SD-MIA analyzes the model’s denoising responses to a target image paired with perturbed text prompts, establishing a cross-modal collaborative perturbation mechanism to extract highly discriminative membership signals. Experimental results demonstrate that SD-MIA substantially outperforms existing black-box methods on both established benchmarks and a newly curated dataset, achieving performance that even surpasses several white-box baselines and setting a new state of the art in pretraining data membership inference.

📝 Abstract

The rapid advancement of diffusion-based image generation models has raised serious concerns regarding potential copyright and privacy infringements involving human-created data. Membership inference attacks (MIAs) have emerged as a promising tool for identifying unauthorized data usage during model training. Existing methods typically assess the ability of model to denoise perturbed suspect images as an indicator of membership status. However, the discriminative power of such features is highly dependent on the degree of model memorization and deteriorates significantly when applied to less exposed data (e.g., pre-training data). Although several methods attempt to enhance detection by leveraging internal model features, these features are generally inaccessible in mainstream closed-source image generation platforms, limiting their practicality. In this paper, we demonstrate that analyzing how a black-box diffusion model denoises a target image and corresponding perturbed textual instructions can reveal more distinctive membership cues. Based on this insight, we propose a black-box membership inference attack framework (named SD-MIA) that leverages a cross-modal data perturbation mechanism to detect pre-training data in diffusion models. We conduct extensive experiments on both a public benchmark dataset and a newly constructed dataset, each comprising pre-training membership and non-membership samples with identical distributions. Experimental results demonstrate that SD-MIA achieves superior performance compared to existing baselines, including those with the unfair advantage of accessing internal model features.

Problem

Research questions and friction points this paper is trying to address.

Membership Inference Attacks

Diffusion Models

Pre-training Data

Black-box Setting

Image Generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

black-box membership inference

diffusion models

cross-modal perturbation