🤖 AI Summary
To address the challenge of jointly preserving structural integrity and semantic plausibility in graph counterfactual explanations, this paper pioneers a reverse-tracing formulation for counterfactual generation and proposes the Graph Inverse Style Transfer (GIST) framework. GIST operates in the spectral domain via disentangled representation learning, integrating spectral alignment constraints—ensuring structural connectivity—and content fidelity constraints—maintaining semantic consistency—thereby overcoming limitations of conventional forward perturbation paradigms and avoiding decision-boundary overshoot. Technically, it unifies spectral graph theory, differentiable spectral interpolation, and graph neural networks to enable controllable fusion of input graph structure with target-class features. Evaluated on eight graph classification benchmarks, GIST improves counterfactual validity by 7.6%, enhances class-distribution explanation fidelity by 45.5%, and significantly reduces spectral distance.
📝 Abstract
Counterfactual explainability seeks to uncover model decisions by identifying minimal changes to the input that alter the predicted outcome. This task becomes particularly challenging for graph data due to preserving structural integrity and semantic meaning. Unlike prior approaches that rely on forward perturbation mechanisms, we introduce Graph Inverse Style Transfer (GIST), the first framework to re-imagine graph counterfactual generation as a backtracking process, leveraging spectral style transfer. By aligning the global structure with the original input spectrum and preserving local content faithfulness, GIST produces valid counterfactuals as interpolations between the input style and counterfactual content. Tested on 8 binary and multi-class graph classification benchmarks, GIST achieves a remarkable +7.6% improvement in the validity of produced counterfactuals and significant gains (+45.5%) in faithfully explaining the true class distribution. Additionally, GIST's backtracking mechanism effectively mitigates overshooting the underlying predictor's decision boundary, minimizing the spectral differences between the input and the counterfactuals. These results challenge traditional forward perturbation methods, offering a novel perspective that advances graph explainability.