🤖 AI Summary
This study addresses the high cost, labor intensity, and poor scalability of manual usability evaluation for recommender system user interfaces. We propose the first automated assessment framework leveraging multimodal large language models (MLLMs). Given interface screenshots—covering both preference elicitation and recommendation presentation scenarios—our method employs structured prompt engineering to guide the MLLM in conducting fine-grained usability analysis grounded in established heuristics (e.g., Nielsen’s), yielding interpretable diagnostic feedback. Our key contribution is the first systematic application of MLLMs to recommender interface usability evaluation, overcoming semantic and contextual reasoning limitations of conventional automated tools. Experiments demonstrate that our approach delivers expert-level assessment quality at low cost and high scalability, significantly accelerating UX iteration. It establishes a novel paradigm for industrial-scale usability diagnostics in recommender systems.
📝 Abstract
Usability is a key factor in the effectiveness of recommender systems. However, the analysis of user interfaces is a time-consuming process that requires expertise. Recent advances in multimodal large language models (LLMs) offer promising opportunities to automate such evaluations. In this work, we explore the potential of multimodal LLMs to assess the usability of recommender system interfaces by considering a variety of publicly available systems as examples. We take user interface screenshots from multiple of these recommender platforms to cover both preference elicitation and recommendation presentation scenarios. An LLM is instructed to analyze these interfaces with regard to different usability criteria and provide explanatory feedback. Our evaluation demonstrates how LLMs can support heuristic-style usability assessments at scale to support the improvement of user experience.