🤖 AI Summary
Severe information overload on modern web pages exacerbates the operational efficiency gap between screen reader users (SRUs) and sighted users (VUs); SRUs must sequentially traverse numerous irrelevant DOM elements, resulting in task completion times averaging twice those of VUs.
Method: We propose an LLM-based dynamic content filtering system that parses user task intent to model and rank DOM element relevance in real time, enabling multi-view adaptive rendering. Crucially, it preserves original page structure while enhancing key information and suppressing distractors.
Contribution/Results: Our approach unifies modeling of both visual and non-visual access requirements. A user study (N=11) shows a 42% average reduction in SRU task time; the performance gap relative to VUs narrows from 2.0× to 1.2×. All participants expressed strong willingness to adopt the system continuously.
📝 Abstract
Modern web interfaces are unnecessarily complex to use as they overwhelm users with excessive text and visuals unrelated to their current goals. This problem particularly impacts screen reader users (SRUs), who navigate content sequentially and may spend minutes traversing irrelevant elements before reaching desired information compared to vision users (VUs) who visually skim in seconds. We present Task Mode, a system that dynamically filters web content based on user-specified goals using large language models to identify and prioritize relevant elements while minimizing distractions. Our approach preserves page structure while offering multiple viewing modes tailored to different access needs. Our user study with 12 participants (6 VUs, 6 SRUs) demonstrates that our approach reduced task completion time for SRUs while maintaining performance for VUs, decreasing the completion time gap between groups from 2x to 1.2x. 11 of 12 participants wanted to use Task Mode in the future, reporting that Task Mode supported completing tasks with less effort and fewer distractions. This work demonstrates how designing new interactions simultaneously for visual and non-visual access can reduce rather than reinforce accessibility disparities in future technology created by human-computer interaction researchers and practitioners.