🤖 AI Summary
This work addresses the over-smoothing problem prevalent in neighborhood-aggregation-based graph embedding methods, which diminishes node representational distinctiveness after multiple iterations and thereby degrades community detection performance. To mitigate this issue, the authors propose a training-free, weighted distribution-aware kernel method that explicitly integrates each node’s intrinsic features with its degree distribution within the Weisfeiler–Lehman framework. This approach is the first to incorporate degree distribution into optimization-free graph embeddings, substantially enhancing node expressiveness. When combined with spectral clustering, the method consistently outperforms state-of-the-art graph embedding techniques—including deep learning approaches—on standard community detection benchmarks.
📝 Abstract
Neighborhood Aggregation Strategy (NAS) is a widely used approach in graph embedding, underpinning both Graph Neural Networks (GNNs) and Weisfeiler-Lehman (WL) methods. However, NAS-based methods are identified to be prone to over-smoothing-the loss of node distinguishability with increased iterations-thereby limiting their effectiveness. This paper identifies two characteristics in a network, i.e., the distributions of nodes and node degrees that are critical for expressive representation but have been overlooked in existing methods. We show that these overlooked characteristics contribute significantly to over-smoothing of NAS-methods. To address this, we propose a novel weighted distribution-aware kernel that embeds nodes while taking their distributional characteristics into consideration. Our method has three distinguishing features: (1) it is the first method to explicitly incorporate both distributional characteristics; (2) it requires no optimization; and (3) it effectively mitigates the adverse effects of over-smoothing, allowing WL to preserve node distinguishability and expressiveness even after many iterations of embedding. Experiments demonstrate that our method achieves superior community detection performance via spectral clustering, outperforming existing graph embedding methods, including deep learning methods, on standard benchmarks.