Maximum Likelihood Estimation on Stochastic Blockmodels for Directed Graph Clustering

📅 2024-03-28

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

This work addresses the modeling and optimization challenges arising from edge directionality in directed graph clustering. We formulate a maximum likelihood estimation (MLE) framework based on the Directed Stochastic Block Model (DSBM) and establish, for the first time, the exact equivalence between the MLE objective and a flow-based optimization problem that jointly accounts for edge density and directionality. Leveraging this insight, we propose an interpretable spectral clustering algorithm and a semidefinite programming (SDP) solver. Using matrix perturbation theory, we derive a tight upper bound on the misclustering rate of the spectral method. Theoretical analysis and empirical evaluation demonstrate that our approach significantly outperforms state-of-the-art baselines on both synthetic and real-world directed networks, achieving statistical optimality, computational tractability, and full interpretability simultaneously.

Technology Category

Application Category

📝 Abstract

This paper studies the directed graph clustering problem through the lens of statistics, where we formulate clustering as estimating underlying communities in the directed stochastic block model (DSBM). We conduct the maximum likelihood estimation (MLE) on the DSBM and thereby ascertain the most probable community assignment given the observed graph structure. In addition to the statistical point of view, we further establish the equivalence between this MLE formulation and a novel flow optimization heuristic, which jointly considers two important directed graph statistics: edge density and edge orientation. Building on this new formulation of directed clustering, we introduce two efficient and interpretable directed clustering algorithms, a spectral clustering algorithm and a semidefinite programming based clustering algorithm. We provide a theoretical upper bound on the number of misclustered vertices of the spectral clustering algorithm using tools from matrix perturbation theory. We compare, both quantitatively and qualitatively, our proposed algorithms with existing directed clustering methods on both synthetic and real-world data, thus providing further ground to our theoretical contributions.

Problem

Research questions and friction points this paper is trying to address.

Extends spectral clustering to directed graphs

Uses likelihood estimation on block models

Improves accuracy over existing methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Maximum likelihood estimation for directed graphs

Spectral relaxation with theoretical error bound

Self-adaptive spectral clustering method

🔎 Similar Papers

Improved Community Detection using Stochastic Block Models