๐ค AI Summary
Diffusion models (DMs) often inherit demographic biasesโe.g., gender biasโfrom training data in face generation, and existing debiasing approaches typically require costly retraining or additional annotated data. This paper proposes a novel, training-free, reference-free latent-space distribution-guided debiasing method. First, we design a lightweight Attribute Distribution Predictor (ADP) that leverages semantic features from denoising U-Net hidden layers to enable unsupervised pseudo-label learning. Second, we impose attribute distribution constraints directly into the diffusion sampling process to dynamically calibrate generation trajectories. To our knowledge, this is the first work to tightly couple latent-space semantic distribution modeling with the diffusion process itself. Our method significantly reduces bias across single- and multi-attribute settings, outperforms baselines consistently on both unconditional and text-to-image DMs, and improves fairness metrics of downstream classifiers.
๐ Abstract
Diffusion Models (DMs) have emerged as powerful generative models with unprecedented image generation capability. These models are widely used for data augmentation and creative applications. However, DMs reflect the biases present in the training datasets. This is especially concerning in the context of faces, where the DM prefers one demographic subgroup vs others (eg. female vs male). In this work, we present a method for debiasing DMs without relying on additional reference data or model retraining. Specifically, we propose Distribution Guidance, which enforces the generated images to follow the prescribed attribute distribution. To realize this, we build on the key insight that the latent features of denoising UNet hold rich demographic semantics, and the same can be leveraged to guide debiased generation. We train Attribute Distribution Predictor (ADP) - a small mlp that maps the latent features to the distribution of attributes. ADP is trained with pseudo labels generated from existing attribute classifiers. The proposed Distribution Guidance with ADP enables us to do fair generation. Our method reduces bias across single/multiple attributes and outperforms the baseline by a significant margin for unconditional and text-conditional diffusion models. Further, we present a downstream task of training a fair attribute classifier by augmenting the training set with our generated data. Code is available at - project page.