🤖 AI Summary
Traditional UI usability evaluation is resource-intensive, expert-dependent, and thus inaccessible to small organizations.
Method: This paper proposes an automated evaluation framework based on multimodal large language models (MLLMs), formalizing usability assessment as a three-stage recommendation task—problem identification, severity ranking, and improvement suggestion generation—and jointly modeling interface text, visual screenshots, and DOM structure without manual annotation.
Contribution/Results: The end-to-end method achieves human-expert-level performance: high problem identification accuracy, strong inter-rater consistency in severity ranking (Kendall’s τ = 0.78), and high-quality suggestions (expert-rated 4.2/5.0). By eliminating reliance on domain experts and labeled data, it significantly lowers the barrier to usability evaluation, enhancing accessibility and scalability of usability engineering.
📝 Abstract
Usability describes a set of essential quality attributes of user interfaces (UI) that influence human-computer interaction. Common evaluation methods, such as usability testing and inspection, are effective but resource-intensive and require expert involvement. This makes them less accessible for smaller organizations. Recent advances in multimodal LLMs offer promising opportunities to automate usability evaluation processes partly by analyzing textual, visual, and structural aspects of software interfaces. To investigate this possibility, we formulate usability evaluation as a recommendation task, where multimodal LLMs rank usability issues by severity. We conducted an initial proof-of-concept study to compare LLM-generated usability improvement recommendations with usability expert assessments. Our findings indicate the potential of LLMs to enable faster and more cost-effective usability evaluation, which makes it a practical alternative in contexts with limited expert resources.