🤖 AI Summary
Addressing challenges in attributed graph clustering—namely, insufficient local dependency modeling and difficulty capturing global structure due to graph sparsity and node attribute heterogeneity—this paper proposes a deep graph clustering framework integrating global and local representation learning. Its core contributions are: (1) a centrality-enhanced spatial attention Graphormer module that jointly encodes node centrality and spatial relationships, explicitly modeling both long-range dependencies and local neighborhood structures; and (2) a contrastive learning-based two-stage pretraining strategy to enhance node representation discriminability and clustering robustness. The framework jointly optimizes representation learning and clustering objectives in an end-to-end manner. Extensive experiments on six benchmark datasets demonstrate significant improvements over 14 state-of-the-art methods: on Cora, it achieves absolute gains of 4.94% in ACC, 13.01% in NMI, and 10.97% in ARI, validating its effectiveness and stability.
📝 Abstract
Attributed graph clustering holds significant importance in modern data analysis. However, due to the complexity of graph data and the heterogeneity of node attributes, leveraging graph information for clustering remains challenging. To address this, we propose a novel deep graph clustering model, GCL-GCN, specifically designed to address the limitations of existing models in capturing local dependencies and complex structures when dealing with sparse and heterogeneous graph data. GCL-GCN introduces an innovative Graphormer module that combines centrality encoding and spatial relationships, effectively capturing both global and local information between nodes, thereby enhancing the quality of node representations. Additionally, we propose a novel contrastive learning module that significantly enhances the discriminative power of feature representations. In the pre-training phase, this module increases feature distinction through contrastive learning on the original feature matrix, ensuring more identifiable initial representations for subsequent graph convolution and clustering tasks. Extensive experimental results on six datasets demonstrate that GCL-GCN outperforms 14 advanced methods in terms of clustering quality and robustness. Specifically, on the Cora dataset, it improves ACC, NMI, and ARI by 4.94%, 13.01%, and 10.97%, respectively, compared to the primary comparison method MBN.