๐ค AI Summary
This study systematically detects and mitigates religious bias in large language models (LLMs) and text-to-image (T2I) diffusion models. We construct a benchmark of 400 natural prompts to quantify bias across masked language modeling, text continuation, and image generation tasksโenabling, for the first time, cross-modal (text + image) religious bias measurement. We further incorporate demographic dimensions (gender, age, nationality) to uncover intersectional coupling mechanisms between religious bias and sociodemographic attributes. To address bias, we propose a lightweight, fine-tuning-free debiasing method based on calibrated prompting. Empirical evaluation across mainstream open- and closed-source models reveals pervasive systemic negative stereotyping toward specific religions. Our method reduces bias metrics by an average of 37.2%, demonstrating strong efficacy. Key contributions include: (1) the first cross-modal religious bias benchmark; (2) empirical characterization of multidimensional intersectional bias; and (3) a practical, deployable prompt-level intervention strategy.
๐ Abstract
Note: This paper includes examples of potentially offensive content related to religious bias, presented solely for academic purposes. The widespread adoption of language models highlights the need for critical examinations of their inherent biases, particularly concerning religion. This study systematically investigates religious bias in both language models and text-to-image generation models, analyzing both open-source and closed-source systems. We construct approximately 400 unique, naturally occurring prompts to probe language models for religious bias across diverse tasks, including mask filling, prompt completion, and image generation. Our experiments reveal concerning instances of underlying stereotypes and biases associated disproportionately with certain religions. Additionally, we explore cross-domain biases, examining how religious bias intersects with demographic factors such as gender, age, and nationality. This study further evaluates the effectiveness of targeted debiasing techniques by employing corrective prompts designed to mitigate the identified biases. Our findings demonstrate that language models continue to exhibit significant biases in both text and image generation tasks, emphasizing the urgent need to develop fairer language models to achieve global acceptability.