🤖 AI Summary
This work addresses the sensitivity to initialization and the lack of scalable, robust inference methods for community detection under the degree-corrected stochastic block model (DCBM). It proposes, for the first time, to formulate DCBM inference as a constrained non-negative matrix factorization problem, yielding a general and scalable optimization framework. The key contributions include a theoretically grounded initialization strategy that does not rely on network structure and an algorithm with linear time complexity. Experimental results demonstrate that the method achieves accuracy comparable to existing DCBM approaches on both synthetic and real-world networks, while significantly improving convergence speed and solution quality—processing graphs with 100,000 nodes and 2 million edges in approximately four minutes.
📝 Abstract
Community detection is a fundamental task in data analysis. Block models form a standard approach to partition nodes according to a graph model, facilitating the analysis and interpretation of the network structure. By grouping nodes with similar connection patterns, they enable the identification of a wide variety of underlying structures. The degree-corrected block model (DCBM) is an established model that accounts for the heterogeneity of node degrees. However, existing inference methods for the DCBM are heuristics that are highly sensitive to initialization, typically done randomly. In this work, we show that DCBM inference can be reformulated as a constrained nonnegative matrix factorization problem. Leveraging this insight, we propose a novel method for community detection and a theoretically well-grounded initialization strategy that provides an initial estimate of communities for inference algorithms. Our approach is agnostic to any specific network structure and applies to graphs with any structure representable by a DCBM, not only assortative ones. Experiments on synthetic and real benchmark networks show that our method detects communities comparable to those found by DCBM inference, while scaling linearly with the number of edges and communities; for instance, it processes a graph with 100,000 nodes and 2,000,000 edges in approximately 4 minutes. Moreover, the proposed initialization strategy significantly improves solution quality and reduces the number of iterations required by all tested inference algorithms. Overall, this work provides a scalable and robust framework for community detection and highlights the benefits of a matrix-factorization perspective for the DCBM.