🤖 AI Summary
In single-network settings, data-driven parameter selection—such as choosing linear combinations of within- and between-community average connectivities based on estimated community structure—introduces selection bias that impedes valid, unbiased statistical inference.
Method: We propose a “network splitting” framework that randomly partitions the original network into two subnetworks sharing the same node set, without sample splitting. One subnetwork guides data-driven parameter selection (e.g., community detection), while the other enables subsequent inference, thereby eliminating selection bias induced by community estimation.
Contribution/Results: The method accommodates Poisson, Gaussian, and Bernoulli edge models; handles both independent and dependent edge structures; and yields post-selection confidence intervals with nominal coverage probability, verifiable via standard calibration. We establish theoretical guarantees for consistency and coverage validity, and demonstrate empirical efficacy through application to the dolphin social network.
📝 Abstract
Given a dataset consisting of a single realization of a network, we consider conducting inference on a parameter selected from the data. In particular, we focus on the setting where the parameter of interest is a linear combination of the mean connectivities within and between estimated communities. Inference in this setting poses a challenge, since the communities are themselves estimated from the data. Furthermore, since only a single realization of the network is available, sample splitting is not possible. In this paper, we show that it is possible to split a single realization of a network consisting of $n$ nodes into two (or more) networks involving the same $n$ nodes; the first network can be used to select a data-driven parameter, and the second to conduct inference on that parameter. In the case of weighted networks with Poisson or Gaussian edges, we obtain two independent realizations of the network; by contrast, in the case of Bernoulli edges, the two realizations are dependent, and so extra care is required. We establish the theoretical properties of our estimators, in the sense of confidence intervals that attain the nominal (selective) coverage, and demonstrate their utility in numerical simulations and in application to a dataset representing the relationships among dolphins in Doubtful Sound, New Zealand.