Post-selection inference with a single realization of a network

📅 2025-08-15

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

In single-network settings, data-driven parameter selection—such as choosing linear combinations of within- and between-community average connectivities based on estimated community structure—introduces selection bias that impedes valid, unbiased statistical inference. Method: We propose a “network splitting” framework that randomly partitions the original network into two subnetworks sharing the same node set, without sample splitting. One subnetwork guides data-driven parameter selection (e.g., community detection), while the other enables subsequent inference, thereby eliminating selection bias induced by community estimation. Contribution/Results: The method accommodates Poisson, Gaussian, and Bernoulli edge models; handles both independent and dependent edge structures; and yields post-selection confidence intervals with nominal coverage probability, verifiable via standard calibration. We establish theoretical guarantees for consistency and coverage validity, and demonstrate empirical efficacy through application to the dolphin social network.

Technology Category

Application Category

📝 Abstract

Given a dataset consisting of a single realization of a network, we consider conducting inference on a parameter selected from the data. In particular, we focus on the setting where the parameter of interest is a linear combination of the mean connectivities within and between estimated communities. Inference in this setting poses a challenge, since the communities are themselves estimated from the data. Furthermore, since only a single realization of the network is available, sample splitting is not possible. In this paper, we show that it is possible to split a single realization of a network consisting of $n$ nodes into two (or more) networks involving the same $n$ nodes; the first network can be used to select a data-driven parameter, and the second to conduct inference on that parameter. In the case of weighted networks with Poisson or Gaussian edges, we obtain two independent realizations of the network; by contrast, in the case of Bernoulli edges, the two realizations are dependent, and so extra care is required. We establish the theoretical properties of our estimators, in the sense of confidence intervals that attain the nominal (selective) coverage, and demonstrate their utility in numerical simulations and in application to a dataset representing the relationships among dolphins in Doubtful Sound, New Zealand.

Problem

Research questions and friction points this paper is trying to address.

Inference on data-selected network parameters with estimated communities

Splitting single network realization for selection and inference

Handling dependent realizations in Bernoulli-edge networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Splitting single network into multiple realizations

Enabling inference on data-driven selected parameters

Providing theoretical guarantees for selective coverage

🔎 Similar Papers

No similar papers found.