🤖 AI Summary
Controlling the false discovery rate (FDR) in multi-link prediction under partial network observability is challenging due to complex dependency structures and unknown, heterogeneous missingness mechanisms. This paper proposes a novel, distribution- and missingness-agnostic FDR control framework applicable to both weighted/unweighted and undirected/bipartite networks. Our method features three key innovations: (1) constructing exchangeability-based conformal p-values leveraging the exchangeability of weighted graph models; (2) mitigating data-reuse bias via a multi-splitting strategy; and (3) introducing an e-value-based aggregation mechanism that guarantees finite-sample FDR control under arbitrary dependence. Experiments on synthetic and real-world networks demonstrate that the method achieves high predictive accuracy and stable FDR calibration, while maintaining statistical rigor and computational robustness.
📝 Abstract
We propose a new method for predicting multiple missing links in partially observed networks while controlling the false discovery rate (FDR), a largely unresolved challenge in network analysis. The main difficulty lies in handling complex dependencies and unknown, heterogeneous missing patterns. We introduce conformal link prediction ({ t clp}), a distribution-free procedure grounded in the exchangeability structure of weighted graphon models. Our approach constructs conformal p-values via a novel multi-splitting strategy that restores exchangeability within local test sets, thereby ensuring valid row-wise FDR control, even under unknown missing mechanisms. To achieve FDR control across all missing links, we further develop a new aggregation scheme based on e-values, which accommodates arbitrary dependence across network predictions. Our method requires no assumptions on the missing rates, applies to weighted, unweighted, undirected, and bipartite networks, and enjoys finite-sample theoretical guarantees. Extensive simulations and real-world data study confirm the effectiveness and robustness of the proposed approach.