🤖 AI Summary
Traditional GUI usability evaluation relies heavily on expert reviews and user testing, which are costly and inefficient, while existing computational agents struggle to accurately assess usability. This work proposes uxCUA—a machine learning–based computational user agent that, for the first time, integrates computable usability metrics with large-scale, labeled UI interaction data to enable end-to-end prediction of usability scores. By prioritizing interaction flows and simulating human-like operations, uxCUA generates fine-grained and credible usability critiques. Notably, it achieves higher evaluation accuracy than larger-scale models and demonstrates effectiveness on both synthetic and real-world GUI interfaces.
📝 Abstract
Usability testing with experts and potential users can assess the effectiveness, efficiency, and user satisfaction of graphical user interfaces (GUIs) but doing so remains a costly and time-intensive process. Prior work has used computer use agents (CUAs) and other generative agents that can simulate user interactions and preference, but we show that agents still struggle to provide accurate usability assessments. In this work, we present a novel machine learning method that operationalizes a computational definition of usability to train CUAs to assess GUI usability by i) prioritizing important interaction flows, ii) executing them through human-like interactions, and iii) predicting a learned numerical usability score. We train a computer use agent, uxCUA, with our algorithm on a large-scale dataset of fully interactive user interfaces (UIs) paired with usability labels and human preferences. We show that uxCUA outperforms larger models in accurate usability assessments and produces realistic critiques of both synthetic and real UIs. More broadly, our work aims to build a principled, data-driven foundation for automated usability assessment in HCI.