๐ค AI Summary
To address label scarcity in semi-supervised node classification, this paper unifies self-supervised graph clustering with node classification, leveraging intrinsic community structure as a supervisory signal for enhanced representation learning. Methodologically, we propose the Soft Orthogonal Graph Network (SOGN), which jointly optimizes classification and clustering via a dual-clustering objective and SinkhornโKnopp normalization to generate balanced soft pseudo-labels. The framework integrates supervised classification loss with unsupervised clustering loss and is compatible with various GNN backbones. Theoretically, we establish, for the first time, a unified connection between the GNN optimization objective and spectral clustering. Extensive experiments on seven real-world graph benchmarks demonstrate significant improvements over state-of-the-art methods, with strong generalization capability and high training stability.
๐ Abstract
The emergence of graph neural networks (GNNs) has offered a powerful tool for semi-supervised node classification tasks. Subsequent studies have achieved further improvements through refining the message passing schemes in GNN models or exploiting various data augmentation techniques to mitigate limited supervision. In real graphs, nodes often tend to form tightly-knit communities/clusters, which embody abundant signals for compensating label scarcity in semi-supervised node classification but are not explored in prior methods.
Inspired by this, this paper presents NCGC that integrates self-supervised graph clustering and semi-supervised classification into a unified framework. Firstly, we theoretically unify the optimization objectives of GNNs and spectral graph clustering, and based on that, develop soft orthogonal GNNs (SOGNs) that leverage a refined message passing paradigm to generate node representations for both classification and clustering. On top of that, NCGC includes a self-supervised graph clustering module that enables the training of SOGNs for learning representations of unlabeled nodes in a self-supervised manner. Particularly, this component comprises two non-trivial clustering objectives and a Sinkhorn-Knopp normalization that transforms predicted cluster assignments into balanced soft pseudo-labels. Through combining the foregoing clustering module with the classification model using a multi-task objective containing the supervised classification loss on labeled data and self-supervised clustering loss on unlabeled data, NCGC promotes synergy between them and achieves enhanced model capacity. Our extensive experiments showcase that the proposed NCGC framework consistently and considerably outperforms popular GNN models and recent baselines for semi-supervised node classification on seven real graphs, when working with various classic GNN backbones.