Fully Unsupervised Self-debiasing of Text-to-Image Diffusion Models

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Text-to-image diffusion models inherit societal biases from their training data (e.g., LAION-5B), leading to stereotyped generations. To address this, we propose an entirely unsupervised, test-time debiasing method applicable to any UNet-based diffusion model—requiring no human annotations, external classifiers, or model fine-tuning. Our approach automatically discovers semantic clusters in the embedding space of a frozen pre-trained image encoder and dynamically steers the diffusion sampling process by minimizing the KL divergence between the output distribution and a uniform reference distribution over these clusters. This real-time, inference-stage guidance is model-agnostic and plug-and-play. Experiments demonstrate significant mitigation of bias across gender, race, and abstract conceptual dimensions on diverse prompts and mainstream models—including Stable Diffusion 1.5 and SDXL—while preserving image fidelity and diversity.

Technology Category

Application Category

📝 Abstract

Text-to-image (T2I) diffusion models have achieved widespread success due to their ability to generate high-resolution, photorealistic images. These models are trained on large-scale datasets, like LAION-5B, often scraped from the internet. However, since this data contains numerous biases, the models inherently learn and reproduce them, resulting in stereotypical outputs. We introduce SelfDebias, a fully unsupervised test-time debiasing method applicable to any diffusion model that uses a UNet as its noise predictor. SelfDebias identifies semantic clusters in an image encoder's embedding space and uses these clusters to guide the diffusion process during inference, minimizing the KL divergence between the output distribution and the uniform distribution. Unlike supervised approaches, SelfDebias does not require human-annotated datasets or external classifiers trained for each generated concept. Instead, it is designed to automatically identify semantic modes. Extensive experiments show that SelfDebias generalizes across prompts and diffusion model architectures, including both conditional and unconditional models. It not only effectively debiases images along key demographic dimensions while maintaining the visual fidelity of the generated images, but also more abstract concepts for which identifying biases is also challenging.

Problem

Research questions and friction points this paper is trying to address.

Addresses biases in text-to-image diffusion models from internet data

Proposes unsupervised debiasing without human annotations or external classifiers

Generalizes across prompts and architectures while preserving image quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised debiasing using semantic clusters in embedding space

KL divergence minimization between output and uniform distribution

Test-time method applicable to various diffusion model architectures

🔎 Similar Papers

Rethinking Training for De-biasing Text-to-Image Generation: Unlocking the Potential of Stable Diffusion