Pure Node Selection for Imbalanced Graph Node Classification

๐Ÿ“… 2025-09-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses class imbalance in graph node classification and identifies, for the first time, the โ€œRandom Anomalous Connection Problemโ€ (RACP)โ€”a phenomenon induced by random seed variability that distorts neighborhood topology and causes severe performance fluctuations. To mitigate this, we propose the Plug-and-Play Node Selection (PNS) module, a lightweight, architecture-agnostic component that filters out stochastic anomalous connections during synthetic node generation. PNS simultaneously alleviates both quantity- and topology-level imbalances, enhancing neighborhood distribution robustness without modifying the underlying GNN backbone. We provide theoretical stability analysis to quantify the impact of random seeds. Extensive experiments across multiple benchmark datasets demonstrate that PNS significantly reduces performance variance, achieves higher average accuracy than state-of-the-art baselines, and effectively eliminates performance degradation caused by adverse random seeds.

Technology Category

Application Category

๐Ÿ“ Abstract
The problem of class imbalance refers to an uneven distribution of quantity among classes in a dataset, where some classes are significantly underrepresented compared to others. Class imbalance is also prevalent in graph-structured data. Graph neural networks (GNNs) are typically based on the assumption of class balance, often overlooking the issue of class imbalance. In our investigation, we identified a problem, which we term the Randomness Anomalous Connectivity Problem (RACP), where certain off-the-shelf models are affected by random seeds, leading to a significant performance degradation. To eliminate the influence of random factors in algorithms, we proposed PNS (Pure Node Sampling) to address the RACP in the node synthesis stage. Unlike existing approaches that design specialized algorithms to handle either quantity imbalance or topological imbalance, PNS is a novel plug-and-play module that operates directly during node synthesis to mitigate RACP. Moreover, PNS also alleviates performance degradation caused by abnormal distribution of node neighbors. We conduct a series of experiments to identify what factors are influenced by random seeds. Experimental results demonstrate the effectiveness and stability of our method, which not only eliminates the effect of unfavorable random seeds but also outperforms the baseline across various benchmark datasets with different GNN backbones. Data and code are available at https://github.com/flzeng1/PNS.
Problem

Research questions and friction points this paper is trying to address.

Addressing class imbalance in graph node classification tasks
Solving Randomness Anomalous Connectivity Problem in graph neural networks
Mitigating performance degradation from abnormal neighbor distribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposed Pure Node Sampling for node synthesis
Plug-and-play module mitigates Randomness Anomalous Connectivity
Alleviates performance degradation from abnormal neighbor distribution
๐Ÿ”Ž Similar Papers
F
Fanlong Zeng
School of Intelligent Systems Science and Engineering, Jinan University, Zhuhai 519070, China
W
Wensheng Gan
School of Intelligent Systems Science and Engineering, Jinan University, Zhuhai 519070, China
Jiayang Wu
Jiayang Wu
Jinan University
AI for science
Philip S. Yu
Philip S. Yu
Professor of Computer Science, University of Illinons at Chicago
Data miningDatabasePrivacy