BiasIG: Benchmarking Multi-dimensional Social Biases in Text-to-Image Models

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This work addresses the insufficient characterization of the multidimensional nature of social bias in current text-to-image (T2I) generation models, which hinders comprehensive diagnosis of systemic discrimination. To this end, we propose BiasIG—the first multidimensional, decoupled bias evaluation benchmark grounded in sociological and machine ethics frameworks—encompassing four core dimensions and 47,040 structured prompts, along with a fully automated evaluation pipeline. By leveraging a fine-tuned multimodal large language model integrated with an alignment verification mechanism, BiasIG achieves human-expert-level consistency in bias detection. Experiments demonstrate that BiasIG not only uncovers confounding effects in existing debiasing methods that inadvertently affect unrelated demographic attributes but also reveals that current interventions often manifest as active discrimination rather than mere ignorance, thereby establishing a quantifiable and closed-loop foundation for advancing fairness in generative AI.

Technology Category

Application Category

📝 Abstract

Text-to-Image (T2I) generative models have revolutionized content creation, yet they inherently risk amplifying societal biases. While sociological research provides systematic classifications of bias, existing T2I benchmarks largely conflate these nuances or focus narrowly on occupational stereotypes, leaving the multi-dimensional nature of generative bias inadequately measured. In this paper, we introduce BiasIG, a unified benchmark that quantifies social biases across a curated dataset of 47,040 prompts. Grounded in sociological and machine ethics frameworks, BiasIG disentangles biases across 4 dimensions to enable fine-grained diagnosis. To facilitate scalable and reliable evaluation, we propose a fully automated pipeline powered by a fine-tuned multi-modal large language model, achieving high alignment accuracy comparable to human experts. Extensive experiments on 8 T2I models and 3 debiasing methods not only validate BiasIG as a robust diagnostic tool, but also reveal critical insights: interventions on protected attributes often trigger unintended confounding effects on unrelated demographics, and debiasing methods exhibit a persistent tendency toward discrimination rather than mere ignorance. Our work advocates for a precise, taxonomy-driven approach to fairness in AIGC, providing a theoretical framework for using BiasIG's metrics as feedback signals in future closed-loop mitigation. The benchmark is openly available at https://github.com/Astarojth/BiasIG.

Problem

Research questions and friction points this paper is trying to address.

social bias

text-to-image models

bias benchmarking

multi-dimensional bias

generative AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

BiasIG

text-to-image generation

social bias benchmarking