Who Gets Left Behind? Auditing Disability Inclusivity in Large Language Models

📅 2025-08-31

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This study identifies systematic inclusivity gaps in large language models’ (LLMs) accessibility support: while visual, auditory, and motor impairments receive relatively greater attention, speech, genetic/developmental, sensory-cognitive, and mental health disabilities remain chronically marginalized. Method: We construct the first human-validated, general-purpose accessibility benchmark and propose a three-dimensional evaluation framework—assessing coverage breadth, category balance, and response specificity—integrating taxonomy-aligned benchmark design, taxonomy-aware prompting, and training strategy exploration. Contribution/Results: Quantitative evaluation across 17 mainstream LLMs reveals significantly low response coverage and shallow support depth for the four underrepresented disability categories. Our findings empirically confirm a pronounced structural imbalance in current LLMs’ accessibility capabilities, establishing a reproducible benchmark and evidence-based foundation for rigorous accessibility assessment and fairness-oriented model improvement.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly used for accessibility guidance, yet many disability groups remain underserved by their advice. To address this gap, we present taxonomy aligned benchmark1 of human validated, general purpose accessibility questions, designed to systematically audit inclusivity across disabilities. Our benchmark evaluates models along three dimensions: Question-Level Coverage (breadth within answers), Disability-Level Coverage (balance across nine disability categories), and Depth (specificity of support). Applying this framework to 17 proprietary and open-weight models reveals persistent inclusivity gaps: Vision, Hearing, and Mobility are frequently addressed, while Speech, Genetic/Developmental, Sensory-Cognitive, and Mental Health remain under served. Depth is similarly concentrated in a few categories but sparse elsewhere. These findings reveal who gets left behind in current LLM accessibility guidance and highlight actionable levers: taxonomy-aware prompting/training and evaluations that jointly audit breadth, balance, and depth.

Problem

Research questions and friction points this paper is trying to address.

Auditing disability inclusivity gaps in Large Language Models

Evaluating accessibility coverage across nine disability categories

Identifying underserved groups in LLM-based accessibility guidance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-validated accessibility benchmark for systematic auditing

Evaluates models across coverage, balance, and depth dimensions

Proposes taxonomy-aware prompting and training solutions

🔎 Similar Papers

No similar papers found.