GNN Explanations that do not Explain and How to find Them

📅 2026-01-28

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

This work addresses a critical reliability issue in self-explaining graph neural networks (SE-GNNs): they may produce degenerate explanations that are disconnected from the model’s actual prediction logic, leading to misleading interpretability. Existing faithfulness metrics often fail to detect such failures. The study systematically characterizes the failure modes of SE-GNN explanations, revealing that degenerate solutions can either arise naturally or be maliciously injected to conceal the use of sensitive attributes. To tackle this, the authors propose a novel faithfulness evaluation framework grounded in both theoretical analysis and empirical validation, introducing a new credibility metric capable of effectively identifying untrustworthy explanations. Experimental results demonstrate that the proposed metric consistently and accurately detects degenerate explanations across diverse settings, significantly outperforming existing approaches and thereby enhancing the reliability and trustworthiness of SE-GNN interpretations.

Technology Category

Application Category

📝 Abstract

Explanations provided by Self-explainable Graph Neural Networks (SE-GNNs) are fundamental for understanding the model's inner workings and for identifying potential misuse of sensitive attributes. Although recent works have highlighted that these explanations can be suboptimal and potentially misleading, a characterization of their failure cases is unavailable. In this work, we identify a critical failure of SE-GNN explanations: explanations can be unambiguously unrelated to how the SE-GNNs infer labels. We show that, on the one hand, many SE-GNNs can achieve optimal true risk while producing these degenerate explanations, and on the other, most faithfulness metrics can fail to identify these failure modes. Our empirical analysis reveals that degenerate explanations can be maliciously planted (allowing an attacker to hide the use of sensitive attributes) and can also emerge naturally, highlighting the need for reliable auditing. To address this, we introduce a novel faithfulness metric that reliably marks degenerate explanations as unfaithful, in both malicious and natural settings. Our code is available in the supplemental.

Problem

Research questions and friction points this paper is trying to address.

Graph Neural Networks

Explainability

Faithfulness

Degenerate Explanations

Sensitive Attributes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-explainable GNNs

degenerate explanations

faithfulness metrics