๐ค AI Summary
This study investigates cultural biases in large language models (LLMs) when detecting ableist speech within the Indian sociocultural context. Method: We translated an English ableism dataset into Hindi, augmented it with ground-truth annotations from Indian disability communities, and employed prompt engineering to conduct cross-lingual, cross-cultural evaluation across eight modelsโincluding GPT-4, Gemini, and India-developed Krutrim. Contribution/Results: Western models consistently overestimated harm, while Indian models tended to underestimate it; all models exhibited heightened permissiveness toward ableist content in Hindi, revealing misalignment between their training data and culturally grounded evaluation criteria. This work presents the first systematic analysis of geographically situated bias in AI-based ableism detection. It proposes a community-informed, locally anchored evaluation framework to advance inclusive, responsible AI development for the Global South.
๐ Abstract
People with disabilities (PwD) experience disproportionately high levels of discrimination and hate online, particularly in India, where entrenched stigma and limited resources intensify these challenges. Large language models (LLMs) are increasingly used to identify and mitigate online hate, yet most research on online ableism focuses on Western audiences with Western AI models. Are these models adequately equipped to recognize ableist harm in non-Western places like India? Do localized, Indic language models perform better? To investigate, we adopted and translated a publicly available ableist speech dataset to Hindi, and prompted eight LLMs--four developed in the U.S. (GPT-4, Gemini, Claude, Llama) and four in India (Krutrim, Nanda, Gajendra, Airavata)--to score and explain ableism. In parallel, we recruited 175 PwD from both the U.S. and India to perform the same task, revealing stark differences between groups. Western LLMs consistently overestimated ableist harm, while Indic LLMs underestimated it. Even more concerning, all LLMs were more tolerant of ableism when it was expressed in Hindi and asserted Western framings of ableist harm. In contrast, Indian PwD interpreted harm through intention, relationality, and resilience--emphasizing a desire to inform and educate perpetrators. This work provides groundwork for global, inclusive standards of ableism, demonstrating the need to center local disability experiences in the design and evaluation of AI systems.