🤖 AI Summary
This study addresses a critical limitation in current explainable AI (XAI) approaches, which predominantly rely on static feature lists and lack coherent narrative structure, thereby hindering human comprehension. To bridge this gap, the work formally defines four core attributes of narrativity in AI explanations for the first time, introduces seven targeted automatic evaluation metrics, and establishes a set of general narrative generation rules. By integrating natural language generation, narrative theory, and text evaluation techniques, the authors construct a quantitative framework for assessing narrative quality in XAI. Experimental results across six datasets demonstrate that the proposed metrics effectively distinguish between descriptive and narrative explanations and significantly outperform conventional NLP evaluation methods.
📝 Abstract
Explainable AI (XAI) aims to make the behaviour of machine learning models interpretable, yet many explanation methods remain difficult to understand. The integration of Natural Language Generation into XAI aims to deliver explanations in textual form, making them more accessible to practitioners. Current approaches, however, largely yield static lists of feature importances. Although such explanations indicate what influences the prediction, they do not explain why the prediction occurs. In this study, we draw on insights from social sciences and linguistics, and argue that XAI explanations should be presented in the form of narratives. Narrative explanations support human understanding through four defining properties: continuous structure, cause-effect mechanisms, linguistic fluency, and lexical diversity. We show that standard Natural Language Processing (NLP) metrics based solely on token probability or word frequency fail to capture these properties and can be matched or exceeded by tautological text that conveys no explanatory content. To address this issue, we propose seven automatic metrics that quantify the narrative quality of explanations along the four identified dimensions. We benchmark current state-of-the-art explanation generation methods on six datasets and show that the proposed metrics separate descriptive from narrative explanations more reliably than standard NLP metrics. Finally, to further advance the field, we propose a set of problem-agnostic XAI Narrative generation rules for producing natural language XAI explanations, so that the resulting XAI Narratives exhibit stronger narrative properties and align with the findings from the linguistic and social science literature.