Evaluating Model Explanations without Ground Truth

📅 2025-05-15

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Existing evaluation methods for model explanations—relying either on ground-truth annotations or strong model-sensitivity assumptions—are fundamentally limited by the absence of reliable, human-validated explanation labels. To address this, we propose AXE, the first ground-truth-agnostic and model-agnostic framework for evaluating local feature importance explanations. AXE’s core contribution lies in formalizing three axiomatic principles—consistency, stability, and separability—and deriving unsupervised, quantitative metrics from them. These metrics enable principled, standalone assessment of explanation quality and facilitate detection of “fairwashing”—i.e., spurious explanations that mask model bias. Extensive experiments across diverse models (e.g., LLMs, tree-based, and neural networks) and benchmark datasets demonstrate that AXE consistently outperforms baseline methods dependent on ground truth or sensitivity analysis. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

There can be many competing and contradictory explanations for a single model prediction, making it difficult to select which one to use. Current explanation evaluation frameworks measure quality by comparing against ideal"ground-truth"explanations, or by verifying model sensitivity to important inputs. We outline the limitations of these approaches, and propose three desirable principles to ground the future development of explanation evaluation strategies for local feature importance explanations. We propose a ground-truth Agnostic eXplanation Evaluation framework (AXE) for evaluating and comparing model explanations that satisfies these principles. Unlike prior approaches, AXE does not require access to ideal ground-truth explanations for comparison, or rely on model sensitivity - providing an independent measure of explanation quality. We verify AXE by comparing with baselines, and show how it can be used to detect explanation fairwashing. Our code is available at https://github.com/KaiRawal/Evaluating-Model-Explanations-without-Ground-Truth.

Problem

Research questions and friction points this paper is trying to address.

Evaluating model explanations without ground truth references

Addressing limitations in current explanation evaluation frameworks

Proposing a ground-truth agnostic framework for explanation quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes ground-truth agnostic evaluation framework AXE

Evaluates explanations without ideal ground-truth comparisons

Detects explanation fairwashing independently of model sensitivity

🔎 Similar Papers

No similar papers found.