🤖 AI Summary
Existing evaluation methods for model explanations—relying either on ground-truth annotations or strong model-sensitivity assumptions—are fundamentally limited by the absence of reliable, human-validated explanation labels. To address this, we propose AXE, the first ground-truth-agnostic and model-agnostic framework for evaluating local feature importance explanations. AXE’s core contribution lies in formalizing three axiomatic principles—consistency, stability, and separability—and deriving unsupervised, quantitative metrics from them. These metrics enable principled, standalone assessment of explanation quality and facilitate detection of “fairwashing”—i.e., spurious explanations that mask model bias. Extensive experiments across diverse models (e.g., LLMs, tree-based, and neural networks) and benchmark datasets demonstrate that AXE consistently outperforms baseline methods dependent on ground truth or sensitivity analysis. The implementation is publicly available.
📝 Abstract
There can be many competing and contradictory explanations for a single model prediction, making it difficult to select which one to use. Current explanation evaluation frameworks measure quality by comparing against ideal"ground-truth"explanations, or by verifying model sensitivity to important inputs. We outline the limitations of these approaches, and propose three desirable principles to ground the future development of explanation evaluation strategies for local feature importance explanations. We propose a ground-truth Agnostic eXplanation Evaluation framework (AXE) for evaluating and comparing model explanations that satisfies these principles. Unlike prior approaches, AXE does not require access to ideal ground-truth explanations for comparison, or rely on model sensitivity - providing an independent measure of explanation quality. We verify AXE by comparing with baselines, and show how it can be used to detect explanation fairwashing. Our code is available at https://github.com/KaiRawal/Evaluating-Model-Explanations-without-Ground-Truth.