🤖 AI Summary
This study addresses a critical gap in current computer-using agents (CUAs), which predominantly simulate interactions of sighted users and fail to accommodate the assistive technologies relied upon by blind and low-vision individuals, thereby hindering inclusive collaboration. The work presents the first systematic characterization of CUAs’ limitations in accessibility contexts, introducing A11y-CUA—a novel benchmark dataset capturing interaction behaviors from both sighted and visually impaired users performing 60 everyday tasks. Through large-scale log analysis, simulation of assistive technologies (e.g., keyboard-only navigation and screen magnification), and comparative experiments with state-of-the-art CUAs, the study reveals significant discrepancies in interaction patterns. Results show that while CUAs achieve a 78.3% task success rate under default conditions, performance sharply declines to 41.67% and 28.3% under simulated assistive technology constraints, underscoring their inadequate modeling of real-world accessibility dynamics.
📝 Abstract
Computer Use Agents (CUAs) operate interfaces by pointing, clicking, and typing -- mirroring interactions of sighted users (SUs) who can thus monitor CUAs and share control. CUAs do not reflect interactions by blind and low-vision users (BLVUs) who use assistive technology (AT). BLVUs thus cannot easily collaborate with CUAs. To characterize the accessibility gap of CUAs, we present A11y-CUA, a dataset of BLVUs and SUs performing 60 everyday tasks with 40.4 hours and 158,325 events. Our dataset analysis reveals that our collected interaction traces quantitatively confirm distinct interaction styles between SU and BLVU groups (mouse- vs. keyboard-dominant) and demonstrate interaction diversity within each group (sequential vs. shortcut navigation for BLVUs). We then compare collected traces to state-of-the-art CUAs under default and AT conditions (keyboard-only, magnifier). The default CUA executed 78.3% of tasks successfully. But with the AT conditions, CUA's performance dropped to 41.67% and 28.3% with keyboard-only and magnifier conditions respectively, and did not reflect nuances of real AT use. With our open A11y-CUA dataset, we aim to promote collaborative and accessible CUAs for everyone.