🤖 AI Summary
This work identifies and quantifies both explicit and implicit caste-based biases in large language models (LLMs) against marginalized groups—particularly Dalits and Shudras—in India, a critical yet unaddressed dimension in LLM fairness research.
Method: We introduce the first multidimensional caste bias analysis framework, spanning sociocultural, economic, educational, and political domains. Leveraging customized prompting strategies and novel multidimensional bias metrics, we enable systematic quantification of implicit bias. We conduct cross-model benchmarking across multiple state-of-the-art LLMs and perform social semantic embedding analysis to assess bias propagation.
Contribution/Results: Empirical evaluation reveals significantly higher bias scores for Dalit/Shudra-related queries compared to dominant caste groups, demonstrating that LLMs not only replicate but amplify real-world social inequities. Our framework provides a scalable, interpretable methodology for caste bias detection and mitigation, filling a foundational gap in algorithmic fairness research and offering actionable insights for equitable AI development.
📝 Abstract
Recent advancements in large language models (LLMs) have revolutionized natural language processing (NLP) and expanded their applications across diverse domains. However, despite their impressive capabilities, LLMs have been shown to reflect and perpetuate harmful societal biases, including those based on ethnicity, gender, and religion. A critical and underexplored issue is the reinforcement of caste-based biases, particularly towards India's marginalized caste groups such as Dalits and Shudras. In this paper, we address this gap by proposing DECASTE, a novel, multi-dimensional framework designed to detect and assess both implicit and explicit caste biases in LLMs. Our approach evaluates caste fairness across four dimensions: socio-cultural, economic, educational, and political, using a range of customized prompting strategies. By benchmarking several state-of-the-art LLMs, we reveal that these models systematically reinforce caste biases, with significant disparities observed in the treatment of oppressed versus dominant caste groups. For example, bias scores are notably elevated when comparing Dalits and Shudras with dominant caste groups, reflecting societal prejudices that persist in model outputs. These results expose the subtle yet pervasive caste biases in LLMs and emphasize the need for more comprehensive and inclusive bias evaluation methodologies that assess the potential risks of deploying such models in real-world contexts.