BIGbench: A Unified Benchmark for Evaluating Multi-dimensional Social Biases in Text-to-Image Models

📅 2024-07-21

📈 Citations: 8

✨ Influential: 1

career value

203K/year

🤖 AI Summary

Text-to-image (T2I) models exhibit pervasive societal biases, yet existing evaluation methods lack fine-grained dimensional differentiation, hindering progress in fairness research. Method: We introduce BIGbench—the first multidimensional, unified benchmark for assessing societal bias in T2I models—systematically covering gender, race, age, and occupation dimensions. We propose a novel four-dimensional bias taxonomy and the first fully automated, high-accuracy, reproducible evaluation framework leveraging multimodal large language models (MLLMs). Evaluation integrates controlled generation comparisons and cross-cultural human validation (inter-annotator agreement: 92.3%). Contribution/Results: We conduct systematic evaluations across eight mainstream T2I models and three debiasing techniques, uncovering cross-dimensional bias correlations and other previously unobserved phenomena. We publicly release the benchmark dataset and evaluation toolkit to advance standardization in generative AI fairness assessment.

Technology Category

Application Category

📝 Abstract

Text-to-Image (T2I) generative models are becoming increasingly crucial due to their ability to generate high-quality images, but also raise concerns about social biases, particularly in human image generation. Sociological research has established systematic classifications of bias. Yet, existing studies on bias in T2I models largely conflate different types of bias, impeding methodological progress. In this paper, we introduce BIGbench, a unified benchmark for Biases of Image Generation, featuring a carefully designed dataset. Unlike existing benchmarks, BIGbench classifies and evaluates biases across four dimensions to enable a more granular evaluation and deeper analysis. Furthermore, BIGbench applies advanced multi-modal large language models to achieve fully automated and highly accurate evaluations. We apply BIGbench to evaluate eight representative T2I models and three debiasing methods. Our human evaluation results by trained evaluators from different races underscore BIGbench's effectiveness in aligning images and identifying various biases. Moreover, our study also reveals new research directions about biases with insightful analysis of our results. Our work is openly accessible at https://github.com/BIGbench2024/BIGbench2024/.

Problem

Research questions and friction points this paper is trying to address.

Evaluate multi-dimensional social biases

Unified benchmark for text-to-image models

Automated and accurate bias identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-dimensional bias classification

Advanced multi-modal models

Automated accurate evaluation

🔎 Similar Papers

FAIntbench: A Holistic and Precise Benchmark for Bias Evaluation in Text-to-Image Models