🤖 AI Summary
This paper investigates the origins of attribute bias and evaluation distortion in unconditional image generation. Addressing the limitation of existing bias assessments—which rely on attribute classifiers whose outputs are highly sensitive to decision boundary artifacts—we propose a reference-distribution-based bias quantification framework. Our approach integrates multi-architecture generation (Diffusion models and GANs), pretrained classifiers, and statistical sensitivity analysis. Key contributions: (1) The intrinsic attribute shift in generated distributions is empirically minimal; (2) Observed substantial bias primarily stems from classifier sensitivity to continuous attributes in high-density regions of the latent space—not from inherent model bias; (3) Current evaluation paradigms induce systematic distortion. Crucially, we demonstrate for the first time that the evaluation framework itself constitutes the dominant confounder in bias detection, challenging prevailing attribution assumptions. This insight calls for more robust, socially aware fairness evaluation practices grounded in distributional analysis rather than classifier-dependent heuristics.
📝 Abstract
The widespread adoption of generative AI models has raised growing concerns about representational harm and potential discriminatory outcomes. Yet, despite growing literature on this topic, the mechanisms by which bias emerges - especially in unconditional generation - remain disentangled. We define the bias of an attribute as the difference between the probability of its presence in the observed distribution and its expected proportion in an ideal reference distribution. In our analysis, we train a set of unconditional image generative models and adopt a commonly used bias evaluation framework to study bias shift between training and generated distributions. Our experiments reveal that the detected attribute shifts are small. We find that the attribute shifts are sensitive to the attribute classifier used to label generated images in the evaluation framework, particularly when its decision boundaries fall in high-density regions. Our empirical analysis indicates that this classifier sensitivity is often observed in attributes values that lie on a spectrum, as opposed to exhibiting a binary nature. This highlights the need for more representative labeling practices, understanding the shortcomings through greater scrutiny of evaluation frameworks, and recognizing the socially complex nature of attributes when evaluating bias.