🤖 AI Summary
Existing graph clustering methods struggle to effectively model node visit uncertainty inherent in random walks and underutilize structural information. To address this, we propose CMDI, a novel framework that reformulates graph clustering as an abstract clustering task aimed at maximizing decoding information (MDI), comprising two sequential phases: graph structure extraction and vertex partitioning. CMDI introduces, for the first time, a two-dimensional structural information theory to explicitly characterize visit uncertainty and integrate prior knowledge. It jointly leverages graph neural networks (GNNs), probabilistic analysis of random walks, and structural information entropy to construct an end-to-end MDI optimization framework. Evaluated on three real-world datasets, CMDI achieves statistically significant improvements over classical baselines in the DI-R metric, demonstrating both high-quality decoding information and computational efficiency—particularly when prior knowledge is incorporated.
📝 Abstract
The clustering method based on graph models has garnered increased attention for its widespread applicability across various knowledge domains. Its adaptability to integrate seamlessly with other relevant applications endows the graph model-based clustering analysis with the ability to robustly extract "natural associations" or "graph structures" within datasets, facilitating the modelling of relationships between data points. Despite its efficacy, the current clustering method utilizing the graph-based model overlooks the uncertainty associated with random walk access between nodes and the embedded structural information in the data. To address this gap, we present a novel Clustering method for Maximizing Decoding Information within graph-based models, named CMDI. CMDI innovatively incorporates two-dimensional structural information theory into the clustering process, consisting of two phases: graph structure extraction and graph vertex partitioning. Within CMDI, graph partitioning is reformulated as an abstract clustering problem, leveraging maximum decoding information to minimize uncertainty associated with random visits to vertices. Empirical evaluations on three real-world datasets demonstrate that CMDI outperforms classical baseline methods, exhibiting a superior decoding information ratio (DI-R). Furthermore, CMDI showcases heightened efficiency, particularly when considering prior knowledge (PK). These findings underscore the effectiveness of CMDI in enhancing decoding information quality and computational efficiency, positioning it as a valuable tool in graph-based clustering analyses.