🤖 AI Summary
Existing self-explanatory graph neural networks (e.g., ProtGNN, PGIB) provide only instance-level prototype explanations and lack explicit modeling and empirical validation of class-level generalization. This paper proposes GraphOracle, the first framework enabling trainable and retroactively evaluable class-level self-explanation for GNNs. It jointly optimizes a classifier and discriminative sparse subgraphs via entropy-regularized subgraph selection, lightweight random-walk-based subgraph extraction, and end-to-end graph–subgraph–prediction training. A masking-based fidelity metric quantifies explanation faithfulness. Experiments on multiple graph classification benchmarks demonstrate significant improvements in explanation fidelity, interpretability, and computational efficiency. Notably, GraphOracle trains orders of magnitude faster than Monte Carlo tree search–based baselines. By unifying class-level reasoning with differentiable subgraph learning, GraphOracle establishes a novel paradigm for class-level explainable graph learning.
📝 Abstract
Enhancing the interpretability of graph neural networks (GNNs) is crucial to ensure their safe and fair deployment. Recent work has introduced self-explainable GNNs that generate explanations as part of training, improving both faithfulness and efficiency. Some of these models, such as ProtGNN and PGIB, learn class-specific prototypes, offering a potential pathway toward class-level explanations. However, their evaluations focus solely on instance-level explanations, leaving open the question of whether these prototypes meaningfully generalize across instances of the same class. In this paper, we introduce GraphOracle, a novel self-explainable GNN framework designed to generate and evaluate class-level explanations for GNNs. Our model jointly learns a GNN classifier and a set of structured, sparse subgraphs that are discriminative for each class. We propose a novel integrated training that captures graph$unicode{x2013}$subgraph$unicode{x2013}$prediction dependencies efficiently and faithfully, validated through a masking-based evaluation strategy. This strategy enables us to retroactively assess whether prior methods like ProtGNN and PGIB deliver effective class-level explanations. Our results show that they do not. In contrast, GraphOracle achieves superior fidelity, explainability, and scalability across a range of graph classification tasks. We further demonstrate that GraphOracle avoids the computational bottlenecks of previous methods$unicode{x2014}$like Monte Carlo Tree Search$unicode{x2014}$by using entropy-regularized subgraph selection and lightweight random walk extraction, enabling faster and more scalable training. These findings position GraphOracle as a practical and principled solution for faithful class-level self-explainability in GNNs.