AutoDebias: Automated Framework for Debiasing Text-to-Image Models

📅 2025-08-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-to-image (T2I) models frequently reproduce societal biases—such as gender and racial stereotypes—especially in cases of intersectional and latent bias; existing debiasing methods rely heavily on predefined bias categories and exhibit poor generalization. This paper introduces the first framework that automatically discovers and mitigates complex, intersecting biases without requiring prior knowledge of bias types. Our approach leverages vision-language models (VLMs) for end-to-end bias detection, integrates CLIP-guided fairness-aware fine-tuning, and employs inclusive prompt generation—all while preserving image quality. Evaluated across a benchmark encompassing 25+ bias scenarios, our method achieves 91.6% bias detection accuracy and reduces biased outputs from 90% to under 1%, with no degradation in visual fidelity. The core contribution is the first fully automated, controllable identification and mitigation of unknown, overlapping, and latent social biases in T2I generation.

Technology Category

Application Category

📝 Abstract
Text-to-Image (T2I) models generate high-quality images from text prompts but often exhibit unintended social biases, such as gender or racial stereotypes, even when these attributes are not mentioned. Existing debiasing methods work well for simple or well-known cases but struggle with subtle or overlapping biases. We propose AutoDebias, a framework that automatically identifies and mitigates harmful biases in T2I models without prior knowledge of specific bias types. Specifically, AutoDebias leverages vision-language models to detect biased visual patterns and constructs fairness guides by generating inclusive alternative prompts that reflect balanced representations. These guides drive a CLIP-guided training process that promotes fairer outputs while preserving the original model's image quality and diversity. Unlike existing methods, AutoDebias effectively addresses both subtle stereotypes and multiple interacting biases. We evaluate the framework on a benchmark covering over 25 bias scenarios, including challenging cases where multiple biases occur simultaneously. AutoDebias detects harmful patterns with 91.6% accuracy and reduces biased outputs from 90% to negligible levels, while preserving the visual fidelity of the original model.
Problem

Research questions and friction points this paper is trying to address.

Detects and mitigates social biases in text-to-image models automatically
Addresses subtle and overlapping biases without prior knowledge
Preserves image quality while reducing biased outputs significantly
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically identifies and mitigates biases
Uses vision-language models for detection
CLIP-guided training for fair outputs
🔎 Similar Papers
No similar papers found.
Hongyi Cai
Hongyi Cai
University of Malaya
Data-centric AIAI for EfficiencyComputer Vision
M
Mohammad Mahdinur Rahman
Faculty of Computer Science and Information Technology, Universiti Malaya
M
Mingkang Dong
Faculty of Computer Science and Information Technology, Universiti Malaya
J
Jie Li
Muxin Pu
Muxin Pu
Monash University
Software TestingComputer Vision
Z
Zhili Fang
Faculty of Computer Science and Information Technology, Universiti Malaya
Y
Yinan Peng
Nanyang Technological University
Hanjun Luo
Hanjun Luo
New York University Abu Dhbai
Trustworthy AILarge Language ModelText-to-Image
Y
Yang Liu
Nanyang Technological University