IceBerg: Debiased Self-Training for Class-Imbalanced Node Classification

📅 2025-02-10

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

To address performance degradation of Graph Neural Networks (GNNs) in class-imbalanced and few-shot node classification, this paper proposes a debiased self-training framework that leverages abundant unlabeled nodes to strengthen weak-class supervision signals. Methodologically, it integrates semi-supervised learning, self-training, and class rebalancing strategies. Key contributions include: (1) the novel Double Balancing module, which jointly mitigates the Matthew effect and label distribution shift inherent in self-training; and (2) decoupling GNN’s propagation and transformation operations to enhance modeling of long-range weak-class signals. Extensive experiments demonstrate that the method significantly outperforms existing approaches on multiple benchmark datasets for imbalanced node classification and achieves state-of-the-art performance on few-shot node classification tasks.

Technology Category

Application Category

📝 Abstract

Graph Neural Networks (GNNs) have achieved great success in dealing with non-Euclidean graph-structured data and have been widely deployed in many real-world applications. However, their effectiveness is often jeopardized under class-imbalanced training sets. Most existing studies have analyzed class-imbalanced node classification from a supervised learning perspective, but they do not fully utilize the large number of unlabeled nodes in semi-supervised scenarios. We claim that the supervised signal is just the tip of the iceberg and a large number of unlabeled nodes have not yet been effectively utilized. In this work, we propose IceBerg, a debiased self-training framework to address the class-imbalanced and few-shot challenges for GNNs at the same time. Specifically, to figure out the Matthew effect and label distribution shift in self-training, we propose Double Balancing, which can largely improve the performance of existing baselines with just a few lines of code as a simple plug-and-play module. Secondly, to enhance the long-range propagation capability of GNNs, we disentangle the propagation and transformation operations of GNNs. Therefore, the weak supervision signals can propagate more effectively to address the few-shot issue. In summary, we find that leveraging unlabeled nodes can significantly enhance the performance of GNNs in class-imbalanced and few-shot scenarios, and even small, surgical modifications can lead to substantial performance improvements. Systematic experiments on benchmark datasets show that our method can deliver considerable performance gain over existing class-imbalanced node classification baselines. Additionally, due to IceBerg's outstanding ability to leverage unsupervised signals, it also achieves state-of-the-art results in few-shot node classification scenarios. The code of IceBerg is available at: https://github.com/ZhixunLEE/IceBerg.

Problem

Research questions and friction points this paper is trying to address.

Addresses class-imbalanced node classification in GNNs.

Enhances GNN performance in few-shot learning scenarios.

Utilizes unlabeled nodes to improve semi-supervised learning.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Debiased self-training for GNNs

Double Balancing for class imbalance

Enhanced propagation in GNNs

🔎 Similar Papers

Rethinking Semi-Supervised Imbalanced Node Classification from Bias-Variance Decomposition