๐ค AI Summary
This work addresses class imbalance in graph node classification and identifies, for the first time, the โRandom Anomalous Connection Problemโ (RACP)โa phenomenon induced by random seed variability that distorts neighborhood topology and causes severe performance fluctuations. To mitigate this, we propose the Plug-and-Play Node Selection (PNS) module, a lightweight, architecture-agnostic component that filters out stochastic anomalous connections during synthetic node generation. PNS simultaneously alleviates both quantity- and topology-level imbalances, enhancing neighborhood distribution robustness without modifying the underlying GNN backbone. We provide theoretical stability analysis to quantify the impact of random seeds. Extensive experiments across multiple benchmark datasets demonstrate that PNS significantly reduces performance variance, achieves higher average accuracy than state-of-the-art baselines, and effectively eliminates performance degradation caused by adverse random seeds.
๐ Abstract
The problem of class imbalance refers to an uneven distribution of quantity among classes in a dataset, where some classes are significantly underrepresented compared to others. Class imbalance is also prevalent in graph-structured data. Graph neural networks (GNNs) are typically based on the assumption of class balance, often overlooking the issue of class imbalance. In our investigation, we identified a problem, which we term the Randomness Anomalous Connectivity Problem (RACP), where certain off-the-shelf models are affected by random seeds, leading to a significant performance degradation. To eliminate the influence of random factors in algorithms, we proposed PNS (Pure Node Sampling) to address the RACP in the node synthesis stage. Unlike existing approaches that design specialized algorithms to handle either quantity imbalance or topological imbalance, PNS is a novel plug-and-play module that operates directly during node synthesis to mitigate RACP. Moreover, PNS also alleviates performance degradation caused by abnormal distribution of node neighbors. We conduct a series of experiments to identify what factors are influenced by random seeds. Experimental results demonstrate the effectiveness and stability of our method, which not only eliminates the effect of unfavorable random seeds but also outperforms the baseline across various benchmark datasets with different GNN backbones. Data and code are available at https://github.com/flzeng1/PNS.