🤖 AI Summary
Existing attributed graph clustering methods suffer from three key challenges: inadequate modeling of long-range dependencies, feature collapse, and structural information loss during graph coarsening. To address these, this paper proposes a multi-scale weighted dual coarsening framework integrated with one-to-many contrastive learning. First, a global similarity-guided edge merging strategy is designed to preserve both long-range dependencies and fine-grained structural details during coarsening. Second, a cluster-center-aware one-to-many contrastive learning objective is introduced to mitigate feature masking caused by high-degree nodes and enhance representation diversity. Third, a multi-scale weight-aware coarsening mechanism is jointly optimized with a graph reconstruction loss and KL-divergence regularization to ensure structural consistency across scales. Extensive experiments on ACM, Citeseer, Cora, and DBLP demonstrate significant and robust performance gains, with the normalized mutual information (NMI) improving by up to 15.24%.
📝 Abstract
This study introduces the Multi-Scale Weight-Based Pairwise Coarsening and Contrastive Learning (MPCCL) model, a novel approach for attributed graph clustering that effectively bridges critical gaps in existing methods, including long-range dependency, feature collapse, and information loss. Traditional methods often struggle to capture high-order graph features due to their reliance on low-order attribute information, while contrastive learning techniques face limitations in feature diversity by overemphasizing local neighborhood structures. Similarly, conventional graph coarsening methods, though reducing graph scale, frequently lose fine-grained structural details. MPCCL addresses these challenges through an innovative multi-scale coarsening strategy, which progressively condenses the graph while prioritizing the merging of key edges based on global node similarity to preserve essential structural information. It further introduces a one-to-many contrastive learning paradigm, integrating node embeddings with augmented graph views and cluster centroids to enhance feature diversity, while mitigating feature masking issues caused by the accumulation of high-frequency node weights during multi-scale coarsening. By incorporating a graph reconstruction loss and KL divergence into its self-supervised learning framework, MPCCL ensures cross-scale consistency of node representations. Experimental evaluations reveal that MPCCL achieves a significant improvement in clustering performance, including a remarkable 15.24% increase in NMI on the ACM dataset and notable robust gains on smaller-scale datasets such as Citeseer, Cora and DBLP.