Nested stochastic block model for simultaneously clustering networks and nodes

📅 2023-07-18
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of simultaneously performing network-level clustering and node-level community detection in multi-network analysis. We propose the first Bayesian nonparametric model based on the nested Dirichlet process (NDP), enabling joint inference of both the number of network types and the number of communities within each network. The model accommodates unlabeled, structurally heterogeneous networks with unequal node sets—overcoming key limitations in modeling anonymized nodes and scale-heterogeneous networks. We develop three Gibbs samplers—standard, collapsed, and blocked—to ensure efficient posterior inference. Extensive experiments demonstrate that the method accurately recovers hierarchical clustering structures on synthetic data and achieves superior performance on two real-world social network datasets. To our knowledge, this is the first unified, adaptive, and scalable framework for multi-network co-analysis, offering principled uncertainty quantification and automatic complexity control without requiring prespecified numbers of clusters or communities.
📝 Abstract
We introduce the nested stochastic block model (NSBM) to cluster a collection of networks while simultaneously detecting communities within each network. NSBM has several appealing features including the ability to work on unlabeled networks with potentially different node sets, the flexibility to model heterogeneous communities, and the means to automatically select the number of classes for the networks and the number of communities within each network. This is accomplished via a Bayesian model, with a novel application of the nested Dirichlet process (NDP) as a prior to jointly model the between-network and within-network clusters. The dependency introduced by the network data creates nontrivial challenges for the NDP, especially in the development of efficient samplers. For posterior inference, we propose several Markov chain Monte Carlo algorithms including a standard Gibbs sampler, a collapsed Gibbs sampler, and two blocked Gibbs samplers that ultimately return two levels of clustering labels from both within and across the networks. Extensive simulation studies are carried out which demonstrate that the model provides very accurate estimates of both levels of the clustering structure. We also apply our model to two social network datasets that cannot be analyzed using any previous method in the literature due to the anonymity of the nodes and the varying number of nodes in each network.
Problem

Research questions and friction points this paper is trying to address.

Clusters networks and detects communities simultaneously.
Handles unlabeled networks with different node sets.
Automatically selects number of classes and communities.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Nested stochastic block model for network clustering
Bayesian model with nested Dirichlet process
Markov chain Monte Carlo algorithms for inference
🔎 Similar Papers
No similar papers found.
N
Nathaniel Josephs
Department of Biostatistics, Yale University
A
A. Amini
Department of Statistics, UCLA
M
M. Paez
Department of Statistical Methods, Federal University of Rio de Janeiro
Lizhen Lin
Lizhen Lin
Department of Mathematics, The University of Maryland
Geometry & StatisticsBayesian TheoryStatistics Theory of Deep LearningGeometric Deep Learning