🤖 AI Summary
Text-to-image diffusion models inherit societal biases from their training data (e.g., LAION-5B), leading to stereotyped generations. To address this, we propose an entirely unsupervised, test-time debiasing method applicable to any UNet-based diffusion model—requiring no human annotations, external classifiers, or model fine-tuning. Our approach automatically discovers semantic clusters in the embedding space of a frozen pre-trained image encoder and dynamically steers the diffusion sampling process by minimizing the KL divergence between the output distribution and a uniform reference distribution over these clusters. This real-time, inference-stage guidance is model-agnostic and plug-and-play. Experiments demonstrate significant mitigation of bias across gender, race, and abstract conceptual dimensions on diverse prompts and mainstream models—including Stable Diffusion 1.5 and SDXL—while preserving image fidelity and diversity.
📝 Abstract
Text-to-image (T2I) diffusion models have achieved widespread success due to their ability to generate high-resolution, photorealistic images. These models are trained on large-scale datasets, like LAION-5B, often scraped from the internet. However, since this data contains numerous biases, the models inherently learn and reproduce them, resulting in stereotypical outputs. We introduce SelfDebias, a fully unsupervised test-time debiasing method applicable to any diffusion model that uses a UNet as its noise predictor. SelfDebias identifies semantic clusters in an image encoder's embedding space and uses these clusters to guide the diffusion process during inference, minimizing the KL divergence between the output distribution and the uniform distribution. Unlike supervised approaches, SelfDebias does not require human-annotated datasets or external classifiers trained for each generated concept. Instead, it is designed to automatically identify semantic modes. Extensive experiments show that SelfDebias generalizes across prompts and diffusion model architectures, including both conditional and unconditional models. It not only effectively debiases images along key demographic dimensions while maintaining the visual fidelity of the generated images, but also more abstract concepts for which identifying biases is also challenging.