🤖 AI Summary
Directed graph clustering faces the challenge of edge asymmetry, where conventional spectral methods—relying on symmetrized adjacency matrices—often discard critical directional information. To address this, we propose a generalized spectral clustering framework that unifies modeling of both directed and undirected graphs. Our approach introduces a generalized Dirichlet energy functional to characterize directed node associations and defines an edge regularization measure based on powers of the random walk transition matrix, thereby preserving essential directionality. Integrating spectral relaxation with an iterative, random-walk-driven parametric regularization strategy, the method enhances robustness to class imbalance. Evaluated on directed k-nearest-neighbor graphs constructed from multiple real-world datasets, our framework consistently outperforms state-of-the-art baselines, achieving particularly significant improvements on highly imbalanced data.
📝 Abstract
Spectral clustering is a popular approach for clustering undirected graphs, but its extension to directed graphs (digraphs) is much more challenging. A typical workaround is to naively symmetrize the adjacency matrix of the directed graph, which can however lead to discarding valuable information carried by edge directionality. In this paper, we present a generalized spectral clustering framework that can address both directed and undirected graphs. Our approach is based on the spectral relaxation of a new functional that we introduce as the generalized Dirichlet energy of a graph function, with respect to an arbitrary positive regularizing measure on the graph edges. We also propose a practical parametrization of the regularizing measure constructed from the iterated powers of the natural random walk on the graph. We present theoretical arguments to explain the efficiency of our framework in the challenging setting of unbalanced classes. Experiments using directed K-NN graphs constructed from real datasets show that our graph partitioning method performs consistently well in all cases, while outperforming existing approaches in most of them.