🤖 AI Summary
This study systematically investigates religious bias in multilingual large language models (MLLMs) across Buddhist, Christian, Hindu, and Islamic contexts, revealing systematic, cross-lingual inconsistencies—particularly pronounced negative stereotyping of Islam. To address this, the authors introduce BRAND, the first bilingual, auditable dataset for South Asia’s four major religions, comprising 2,400+ English–Bengali samples. They design diverse prompt templates for controlled comparative experiments. Quantitative and qualitative analyses demonstrate that state-of-the-art MLLMs consistently underperform in Bengali relative to English and perpetuate anti-Islamic bias even in ostensibly religion-neutral queries. This work is the first to uncover implicit structural imbalances in religious representation within multilingual LLMs, establishing a novel benchmark and methodological framework for assessing religious fairness in AI systems.
📝 Abstract
While recent developments in large language models have improved bias detection and classification, sensitive subjects like religion still present challenges because even minor errors can result in severe misunderstandings. In particular, multilingual models often misrepresent religions and have difficulties being accurate in religious contexts. To address this, we introduce BRAND: Bilingual Religious Accountable Norm Dataset, which focuses on the four main religions of South Asia: Buddhism, Christianity, Hinduism, and Islam, containing over 2,400 entries, and we used three different types of prompts in both English and Bengali. Our results indicate that models perform better in English than in Bengali and consistently display bias toward Islam, even when answering religion-neutral questions. These findings highlight persistent bias in multilingual models when similar questions are asked in different languages. We further connect our findings to the broader issues in HCI regarding religion and spirituality.