Do Metrics for Counterfactual Explanations Align with User Perception?

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study investigates whether existing algorithmic evaluation metrics for counterfactual explanations align with users’ perceptions of explanation quality. Through user studies conducted on three datasets, the authors systematically compare widely used algorithmic metrics against multidimensional human subjective ratings of counterfactual explanations. Employing correlation analyses and multivariate regression models, they assess the consistency and predictive power of these metrics. The findings reveal that algorithmic metrics generally exhibit weak correlations with human judgments and are highly dataset-dependent. Moreover, increasing the number of metrics yields only marginal improvements in predictive performance. These results expose structural limitations in current evaluation practices, underscoring their inability to capture key aspects of explanation quality that matter to users, and provide empirical support for advancing human-centered evaluation paradigms in explainable AI.

Technology Category

Application Category

📝 Abstract

Explainability is widely regarded as essential for trustworthy artificial intelligence systems. However, the metrics commonly used to evaluate counterfactual explanations are algorithmic evaluation metrics that are rarely validated against human judgments of explanation quality. This raises the question of whether such metrics meaningfully reflect user perceptions. We address this question through an empirical study that directly compares algorithmic evaluation metrics with human judgments across three datasets. Participants rated counterfactual explanations along multiple dimensions of perceived quality, which we relate to a comprehensive set of standard counterfactual metrics. We analyze both individual relationships and the extent to which combinations of metrics can predict human assessments. Our results show that correlations between algorithmic metrics and human ratings are generally weak and strongly dataset-dependent. Moreover, increasing the number of metrics used in predictive models does not lead to reliable improvements, indicating structural limitations in how current metrics capture criteria relevant for humans. Overall, our findings suggest that widely used counterfactual evaluation metrics fail to reflect key aspects of explanation quality as perceived by users, underscoring the need for more human-centered approaches to evaluating explainable artificial intelligence.

Problem

Research questions and friction points this paper is trying to address.

counterfactual explanations

explainable AI

evaluation metrics

human perception

trustworthy AI

Innovation

Methods, ideas, or system contributions that make the work stand out.

counterfactual explanations

explainable AI

human evaluation