π€ AI Summary
Existing rule-based mobile accessibility auditing tools suffer from limited coverage and struggle to detect semantic-level screen reader issuesβsuch as missing contextual information or redundant utterances. To address this, we propose the first method that deeply integrates large language models (LLMs) into the mobile accessibility audit pipeline. Our approach synergizes UI traversal frameworks with screen reader behavioral modeling to automatically extract interface metadata and speech transcription texts, which are then semantically analyzed and defect-classified by an LLM. We further introduce a novel multi-expert collaborative evaluation protocol to ensure diagnostic feedback aligns with real user needs. Evaluated on 14 real-world app screens, our method achieves a mean detection coverage of 69.2%, substantially outperforming mainstream tools (31.3%). Expert assessment confirms its superior diagnostic accuracy and practical utility.
π Abstract
Many mobile apps are inaccessible, thereby excluding people from their potential benefits. Existing rule-based accessibility checkers aim to mitigate these failures by identifying errors early during development but are constrained in the types of errors they can detect. We present ScreenAudit, an LLM-powered system designed to traverse mobile app screens, extract metadata and transcripts, and identify screen reader accessibility errors overlooked by existing checkers. We recruited six accessibility experts including one screen reader user to evaluate ScreenAudit's reports across 14 unique app screens. Our findings indicate that ScreenAudit achieves an average coverage of 69.2%, compared to only 31.3% with a widely-used accessibility checker. Expert feedback indicated that ScreenAudit delivered higher-quality feedback and addressed more aspects of screen reader accessibility compared to existing checkers, and that ScreenAudit would benefit app developers in real-world settings.