Conformal Link Prediction with False Discovery Rate Control

📅 2025-07-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Controlling the false discovery rate (FDR) in multi-link prediction under partial network observability is challenging due to complex dependency structures and unknown, heterogeneous missingness mechanisms. This paper proposes a novel, distribution- and missingness-agnostic FDR control framework applicable to both weighted/unweighted and undirected/bipartite networks. Our method features three key innovations: (1) constructing exchangeability-based conformal p-values leveraging the exchangeability of weighted graph models; (2) mitigating data-reuse bias via a multi-splitting strategy; and (3) introducing an e-value-based aggregation mechanism that guarantees finite-sample FDR control under arbitrary dependence. Experiments on synthetic and real-world networks demonstrate that the method achieves high predictive accuracy and stable FDR calibration, while maintaining statistical rigor and computational robustness.

Technology Category

Application Category

📝 Abstract
We propose a new method for predicting multiple missing links in partially observed networks while controlling the false discovery rate (FDR), a largely unresolved challenge in network analysis. The main difficulty lies in handling complex dependencies and unknown, heterogeneous missing patterns. We introduce conformal link prediction ({ t clp}), a distribution-free procedure grounded in the exchangeability structure of weighted graphon models. Our approach constructs conformal p-values via a novel multi-splitting strategy that restores exchangeability within local test sets, thereby ensuring valid row-wise FDR control, even under unknown missing mechanisms. To achieve FDR control across all missing links, we further develop a new aggregation scheme based on e-values, which accommodates arbitrary dependence across network predictions. Our method requires no assumptions on the missing rates, applies to weighted, unweighted, undirected, and bipartite networks, and enjoys finite-sample theoretical guarantees. Extensive simulations and real-world data study confirm the effectiveness and robustness of the proposed approach.
Problem

Research questions and friction points this paper is trying to address.

Control false discovery rate in network link prediction
Handle complex dependencies and missing patterns
Predict missing links without distribution assumptions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conformal p-values via multi-splitting strategy
E-value aggregation for FDR control
Distribution-free method for various network types
🔎 Similar Papers
No similar papers found.