🤖 AI Summary
This work addresses the challenge of spurious shortcuts in heterophilous graphs, where inductive subgraphs often mislead graph neural networks (GNNs) by introducing non-causal correlations. From a causal inference perspective, the study is the first to formally characterize this mechanism and proposes a debiased causal graph that blocks confounding and spillover pathways. Building upon this, the authors introduce a causally disentangled GNN framework that effectively separates spurious associations from genuine causal subgraph signals. By innovatively integrating causal disentanglement into the elimination of non-causal paths, the method significantly enhances both accuracy and robustness in node classification. Extensive experiments demonstrate its consistent superiority over state-of-the-art heterophilous graph learning approaches across multiple real-world datasets.
📝 Abstract
Heterophily is a prevalent property of real-world graphs and is well known to impair the performance of homophilic Graph Neural Networks (GNNs). Prior work has attempted to adapt GNNs to heterophilic graphs through non-local neighbor extension or architecture refinement. However, the fundamental reasons behind misclassifications remain poorly understood. In this work, we take a novel perspective by examining recurring inductive subgraphs, empirically and theoretically showing that they act as spurious shortcuts that mislead GNNs and reinforce non-causal correlations in heterophilic graphs. To address this, we adopt a causal inference perspective to analyze and correct the biased learning behavior induced by shortcut inductive subgraphs. We propose a debiased causal graph that explicitly blocks confounding and spillover paths responsible for these shortcuts. Guided by this causal graph, we introduce Causal Disentangled GNN (CD-GNN), a principled framework that disentangles spurious inductive subgraphs from true causal subgraphs by explicitly blocking non-causal paths. By focusing on genuine causal signals, CD-GNN substantially improves the robustness and accuracy of node classification in heterophilic graphs. Extensive experiments on real-world datasets not only validate our theoretical findings but also demonstrate that our proposed CD-GNN outperforms state-of-the-art heterophily-aware baselines.