Addressing the Scarcity of Benchmarks for Graph XAI

📅 2025-05-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Graph neural networks (GNNs) lack interpretability in graph classification, hindering their deployment in safety-critical applications. Existing eXplainable AI (XAI) benchmarks for graphs rely heavily on synthetic data or small-scale manual annotations, lacking realistic, verifiable ground-truth subgraph explanations. To address this, we propose the first automated framework for constructing XAI benchmarks on real-world graph data, integrating graph structural analysis with controllable label injection to enable scalable, principled generation of ground-truth explanations. We publicly release 15 ready-to-use, real-world benchmarks and an open-source toolkit capable of generating over 2,000 additional benchmarks. Furthermore, we conduct the first systematic, unified evaluation of mainstream graph explainers across these benchmarks, uncovering significant performance disparities and fundamental limitations. This work overcomes the longstanding bottleneck in graph XAI evaluation—severe scarcity of realistic, ground-truth-anchored benchmarks.

Technology Category

Application Category

📝 Abstract
While Graph Neural Networks (GNNs) have become the de facto model for learning from structured data, their decisional process remains opaque to the end user, undermining their deployment in safety-critical applications. In the case of graph classification, Explainable Artificial Intelligence (XAI) techniques address this major issue by identifying sub-graph motifs that explain predictions. However, advancements in this field are hindered by a chronic scarcity of benchmark datasets with known ground-truth motifs to assess the explanations' quality. Current graph XAI benchmarks are limited to synthetic data or a handful of real-world tasks hand-curated by domain experts. In this paper, we propose a general method to automate the construction of XAI benchmarks for graph classification from real-world datasets. We provide both 15 ready-made benchmarks, as well as the code to generate more than 2000 additional XAI benchmarks with our method. As a use case, we employ our benchmarks to assess the effectiveness of some popular graph explainers.
Problem

Research questions and friction points this paper is trying to address.

Lack of benchmark datasets for graph XAI evaluation
Current benchmarks limited to synthetic or few expert-curated datasets
Need automated method to create diverse real-world XAI benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated construction of XAI benchmarks for graphs
Generates 2000+ benchmarks from real-world datasets
Provides 15 ready-made benchmarks for evaluation
🔎 Similar Papers
No similar papers found.