Non-asymptotic goodness-of-fit tests and model selection in valued stochastic blockmodels

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This paper addresses the problem of finite-sample goodness-of-fit testing for weighted and non-Bernoulli stochastic block models (SBMs), where conventional asymptotic tests fail under small sample sizes, edge censoring, or heterogeneous edge-sampling mechanisms. The proposed method constructs an explicit Markov basis, employs MCMC to generate a reference distribution, and defines a robust test statistic; it further develops asymptotic theory to guide latent block number selection. Simulations demonstrate well-controlled Type-I error rates and high statistical power. Applied to a host–parasite interaction network, the method selects a significantly larger optimal number of blocks than prior studies, uncovering finer-scale ecological organization. Key contributions include: (i) the first non-asymptotic goodness-of-fit testing framework for generalized SBMs; (ii) unified handling of censored and weighted edges; and (iii) synergistic optimization of finite-sample performance and asymptotic guarantees in block number selection.

Technology Category

Application Category

📝 Abstract

A valued stochastic blockmodel (SBM) is a general way to view networked data in which nodes are grouped into blocks and links between them are measured by counts or labels. This family allows for varying dyad sampling schemes, thereby including the classical, Poisson, and labeled SBMs, as well as those in which some edge observations are censored. This paper addresses the question of testing goodness-of-fit of such non-Bernoulli SBMs, focusing in particular on finite-sample tests. We derive explicit Markov bases moves necessary to generate samples from reference distributions and define goodness-of-fit statistics for determining model fit, comparable to those in the literature for related model families. For the labeled SBM, which includes in particular the censored-edge model, we study the asymptotic behavior of said statistics. One of the main purposes of testing goodness-of-fit of an SBM is to determine whether block membership of the nodes influences network formation. Power and Type 1 error rates are verified on simulated data. Additionally, we discuss the use of asymptotic results in selecting the number of blocks under the latent-block modeling assumption. The method derived for Poisson SBM is applied to ecological networks of host-parasite interactions. Our data analysis conclusions differ in selecting the number of blocks for the species from previous results in the literature.

Problem

Research questions and friction points this paper is trying to address.

Developing finite-sample goodness-of-fit tests for non-Bernoulli stochastic blockmodels

Determining whether node block membership influences network formation patterns

Selecting appropriate number of blocks in latent-block network models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Develops finite-sample goodness-of-fit tests

Derives Markov bases for reference distributions

Applies method to Poisson SBM ecological networks

🔎 Similar Papers

Composite Goodness-of-fit Tests with Kernels