🤖 AI Summary
Text-to-image diffusion models (e.g., Stable Diffusion) tend to amplify societal biases—such as gender and racial stereotypes—in generated images. To address this, we propose a training-free, post-hoc debiasing framework that operates by intervening on prompt embeddings within the CLIP embedding space, enabling joint mitigation of multiple bias attributes. Our key contributions are threefold: (1) the first application of Fair Principal Component Analysis (Fair PCA) for dimensionality reduction in this context; (2) a cross-group unified projection strategy coupled with empirically calibrated noise injection to alleviate intersectional bias while preserving semantic fidelity; and (3) a model-agnostic, plug-and-play design. Experiments demonstrate substantial improvements in fairness across gender, race, and their intersections, with only negligible degradation in image quality and prompt adherence—outperforming existing post-hoc debiasing methods.
📝 Abstract
Text-to-image diffusion models, such as Stable Diffusion, have demonstrated remarkable capabilities in generating high-quality and diverse images from natural language prompts. However, recent studies reveal that these models often replicate and amplify societal biases, particularly along demographic attributes like gender and race. In this paper, we introduce FairImagen (https://github.com/fuzihaofzh/FairImagen), a post-hoc debiasing framework that operates on prompt embeddings to mitigate such biases without retraining or modifying the underlying diffusion model. Our method integrates Fair Principal Component Analysis to project CLIP-based input embeddings into a subspace that minimizes group-specific information while preserving semantic content. We further enhance debiasing effectiveness through empirical noise injection and propose a unified cross-demographic projection method that enables simultaneous debiasing across multiple demographic attributes. Extensive experiments across gender, race, and intersectional settings demonstrate that FairImagen significantly improves fairness with a moderate trade-off in image quality and prompt fidelity. Our framework outperforms existing post-hoc methods and offers a simple, scalable, and model-agnostic solution for equitable text-to-image generation.