🤖 AI Summary
Existing mobile accessibility auditing tools rely on static analysis or shallow contextual heuristics, limiting their ability to detect functional interaction errors—such as label-function mismatches, navigational disorientation, and missing feedback. This paper introduces TaskAudit, the first task-agent-based dynamic detection framework for functional accessibility flaws. It synthesizes user tasks via large language models, executes them using screen-reader-driven agents, and analyzes interaction traces to uncover deep-seated functional defects. Evaluated on 54 real-world app screens, TaskAudit identified 48 previously undetected functional accessibility issues—substantially outperforming conventional tools, which detected only 4–20 errors per app. By pioneering task-oriented interactive simulation in accessibility evaluation, this work establishes a novel methodology and provides a practical, scalable tool for mobile accessibility testing.
📝 Abstract
Accessibility checkers are tools in support of accessible app development and their use is encouraged by accessibility best practices. However, most current checkers evaluate static or mechanically-generated contexts, failing to capture common accessibility errors impacting mobile app functionality. We present TaskAudit, an accessibility evaluation system that focuses on detecting functiona11ity errors through simulated interactions. TaskAudit comprises three components: a Task Generator that constructs interactive tasks from app screens, a Task Executor that uses agents with a screen reader proxy to perform these tasks, and an Accessibility Analyzer that detects and reports accessibility errors by examining interaction traces. Evaluation on real-world apps shows that our strategy detects 48 functiona11ity errors from 54 app screens, compared to between 4 and 20 with existing checkers. Our analysis demonstrates common error patterns that TaskAudit can detect in addition to prior work, including label-functionality mismatch, cluttered navigation, and inappropriate feedback.