🤖 AI Summary
Current user evaluations of eXplainable Artificial Intelligence (XAI) in healthcare lack systematic frameworks and practical guidelines, leading to insufficient validation of trustworthiness and usability. Method: We conducted a systematic literature review across multiple databases (e.g., PubMed, ACM Digital Library), coding and analyzing 82 user evaluation studies conducted in clinical or medical contexts. Contribution/Results: We introduce the first atomic Explanatory Experience Attributes Framework tailored specifically for healthcare XAI, uncovering intrinsic relationships among explanatory properties. Building on this, we propose a sensitivity-based evaluation guideline grounded in both system characteristics and clinical context. Our contributions include an updated, validated evaluation framework; interdisciplinary, practice-oriented implementation protocols; and an analysis of emerging empirical trends. Collectively, these advances significantly enhance the verifiability of XAI systems’ trustworthiness and usability in real-world clinical settings.
📝 Abstract
Despite promising developments in Explainable Artificial Intelligence, the practical value of XAI methods remains under-explored and insufficiently validated in real-world settings. Robust and context-aware evaluation is essential, not only to produce understandable explanations but also to ensure their trustworthiness and usability for intended users, but tends to be overlooked because of no clear guidelines on how to design an evaluation with users. This study addresses this gap with two main goals: (1) to develop a framework of well-defined, atomic properties that characterise the user experience of XAI in healthcare; and (2) to provide clear, context-sensitive guidelines for defining evaluation strategies based on system characteristics. We conducted a systematic review of 82 user studies, sourced from five databases, all situated within healthcare settings and focused on evaluating AI-generated explanations. The analysis was guided by a predefined coding scheme informed by an existing evaluation framework, complemented by inductive codes developed iteratively. The review yields three key contributions: (1) a synthesis of current evaluation practices, highlighting a growing focus on human-centred approaches in healthcare XAI; (2) insights into the interrelations among explanation properties; and (3) an updated framework and a set of actionable guidelines to support interdisciplinary teams in designing and implementing effective evaluation strategies for XAI systems tailored to specific application contexts.