Post-selection inference with a single realization of a network

📅 2025-08-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In single-network settings, data-driven parameter selection—such as choosing linear combinations of within- and between-community average connectivities based on estimated community structure—introduces selection bias that impedes valid, unbiased statistical inference. Method: We propose a “network splitting” framework that randomly partitions the original network into two subnetworks sharing the same node set, without sample splitting. One subnetwork guides data-driven parameter selection (e.g., community detection), while the other enables subsequent inference, thereby eliminating selection bias induced by community estimation. Contribution/Results: The method accommodates Poisson, Gaussian, and Bernoulli edge models; handles both independent and dependent edge structures; and yields post-selection confidence intervals with nominal coverage probability, verifiable via standard calibration. We establish theoretical guarantees for consistency and coverage validity, and demonstrate empirical efficacy through application to the dolphin social network.

Technology Category

Application Category

📝 Abstract
Given a dataset consisting of a single realization of a network, we consider conducting inference on a parameter selected from the data. In particular, we focus on the setting where the parameter of interest is a linear combination of the mean connectivities within and between estimated communities. Inference in this setting poses a challenge, since the communities are themselves estimated from the data. Furthermore, since only a single realization of the network is available, sample splitting is not possible. In this paper, we show that it is possible to split a single realization of a network consisting of $n$ nodes into two (or more) networks involving the same $n$ nodes; the first network can be used to select a data-driven parameter, and the second to conduct inference on that parameter. In the case of weighted networks with Poisson or Gaussian edges, we obtain two independent realizations of the network; by contrast, in the case of Bernoulli edges, the two realizations are dependent, and so extra care is required. We establish the theoretical properties of our estimators, in the sense of confidence intervals that attain the nominal (selective) coverage, and demonstrate their utility in numerical simulations and in application to a dataset representing the relationships among dolphins in Doubtful Sound, New Zealand.
Problem

Research questions and friction points this paper is trying to address.

Inference on data-selected network parameters with estimated communities
Splitting single network realization for selection and inference
Handling dependent realizations in Bernoulli-edge networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Splitting single network into multiple realizations
Enabling inference on data-driven selected parameters
Providing theoretical guarantees for selective coverage
🔎 Similar Papers
No similar papers found.
E
Ethan Ancell
Department of Statistics, University of Washington
Daniela Witten
Daniela Witten
Professor of Statistics & Biostatistics, Dorothy Gilford Endowed Chair, University of Washington
statisticsmachine learning
D
Daniel Kessler
Department of Statistics and Operations Research, University of North Carolina at Chapel Hill; School of Data Science and Society, University of North Carolina at Chapel Hill