Repurposing Image Diffusion Models for Adversarial Synthetic Structured Data: A Case Study of Ground Truth Drift

📅 2026-05-01

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This study demonstrates that adversaries can exploit off-the-shelf image diffusion models—such as the U-Net architecture in Stable Diffusion—to generate deceptive, structured synthetic data that induces “ground-truth label drift” in AI systems. By mapping the UCI Adult Income tabular dataset into single-channel pseudo-images with a carefully designed feature-space layout aligned with the spatial locality inductive bias of diffusion models, the method produces high-fidelity samples without modifying the pretrained model. The work introduces the notion of “synthetic evidence,” distinguishing statistical fidelity from perceptual realism, and shows that attackers can manipulate downstream model predictions by crafting synthetic data that mimics the cognitive logic of machine receivers, thereby systematically distorting the perceived label distribution and leading models to misclassify such data as authentic.

📝 Abstract

Public image diffusion models are now powerful enough that an attacker without the resources to train a tabular-specific generator may repurpose one off the shelf. This study tests that possibility directly. An unmodified Stable Diffusion U-Net is applied to the UCI Adult Income dataset by reshaping each row into a small single-channel pseudo-image. The architecture's inductive bias toward spatial locality makes feature placement a design variable, and several layouts are tested. However, this is only the beginning of the story, as this paper also draws two philosophical distinctions. One separates statistical from perceptual realism: whether synthetic content holds up to a machine's correlation audits or a human's sensory inspection. The other introduces synthetic evidence as a category alongside synthetic media: AI-generated material whose consumer is a machine in a closed evidentiary pipeline rather than a person in an open information system. An attacker succeeds with synthetic evidence by thinking like the machine that will receive it. And the more the attacker succeeds, the more they can induce ground truth drift: the silent reclassification of AI-generated outputs as authentic when reused in pipelines that do not interrogate their provenance.

Problem

Research questions and friction points this paper is trying to address.

adversarial synthetic data

ground truth drift

diffusion models

structured data

synthetic evidence

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion models

synthetic structured data

ground truth drift