FAIntbench: A Holistic and Precise Benchmark for Bias Evaluation in Text-to-Image Models

📅 2024-05-28
🏛️ arXiv.org
📈 Citations: 5
Influential: 0
📄 PDF
🤖 AI Summary
Existing research lacks a systematic definition and evaluation framework for biases in text-to-image (T2I) model outputs, hindering the development of debiasing techniques. To address this, we introduce FAIntbench—the first comprehensive bias evaluation benchmark specifically designed for T2I models—and propose a novel four-dimensional assessment framework: *bias manifestation*, *visibility*, *acquired attributes*, and *protected attributes*. We empirically uncover a new class of latent biases induced by mainstream techniques such as knowledge distillation. Leveraging multi-metric quantification, controllable prompt engineering, large-scale sampling, and double-blind human evaluation, we identify gender, racial, occupational, and other biases across seven state-of-the-art T2I models. FAIntbench is publicly released to enable reproducible, extensible, cross-model fairness analysis, thereby advancing the research ecosystem for fairness in generative AI.

Technology Category

Application Category

📝 Abstract
The rapid development and reduced barriers to entry for Text-to-Image (T2I) models have raised concerns about the biases in their outputs, but existing research lacks a holistic definition and evaluation framework of biases, limiting the enhancement of debiasing techniques. To address this issue, we introduce FAIntbench, a holistic and precise benchmark for biases in T2I models. In contrast to existing benchmarks that evaluate bias in limited aspects, FAIntbench evaluate biases from four dimensions: manifestation of bias, visibility of bias, acquired attributes, and protected attributes. We applied FAIntbench to evaluate seven recent large-scale T2I models and conducted human evaluation, whose results demonstrated the effectiveness of FAIntbench in identifying various biases. Our study also revealed new research questions about biases, including the side-effect of distillation. The findings presented here are preliminary, highlighting the potential of FAIntbench to advance future research aimed at mitigating the biases in T2I models. Our benchmark is publicly available to ensure the reproducibility.
Problem

Research questions and friction points this paper is trying to address.

Evaluates biases in Text-to-Image models holistically.
Introduces FAIntbench for precise bias assessment.
Identifies biases across multiple dimensions effectively.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Holistic bias evaluation framework
Four-dimensional bias assessment
Public benchmark for reproducibility
Hanjun Luo
Hanjun Luo
New York University Abu Dhbai
Trustworthy AILarge Language ModelText-to-Image
Z
Ziye Deng
Zhejiang University, Hangzhou, Zhejiang, China
Ruizhe Chen
Ruizhe Chen
Zhejiang University
LLMMLLM
Z
Zuo-Qiang Liu
Zhejiang University, Hangzhou, Zhejiang, China