🤖 AI Summary
Multi-source epidemic source localization—identifying the initial set of infected nodes from a single snapshot of node infection states—remains challenging, as existing methods either lack statistical guarantees or are constrained to specific diffusion models and network topologies. Method: We propose the first conformal prediction framework for this task with statistically guaranteed recall, requiring no assumptions about the underlying propagation model and applicable to arbitrary network structures and diffusion dynamics. Our core innovation is a novel scoring function that quantifies alignment between predicted infection probabilities and the true source set, enabling calibration of prediction sets to achieve user-specified recall levels. Results: Extensive experiments on diverse real-world and synthetic networks demonstrate that our method significantly outperforms state-of-the-art baselines, achieving high precision and computational scalability while strictly satisfying the prescribed coverage guarantee.
📝 Abstract
Detecting the origin of information or infection spread in networks is a fundamental challenge with applications in misinformation tracking, epidemiology, and beyond. We study the multi-source detection problem: given snapshot observations of node infection status on a graph, estimate the set of source nodes that initiated the propagation. Existing methods either lack statistical guarantees or are limited to specific diffusion models and assumptions. We propose a novel conformal prediction framework that provides statistically valid recall guarantees for source set detection, independent of the underlying diffusion process or data distribution. Our approach introduces principled score functions to quantify the alignment between predicted probabilities and true sources, and leverages a calibration set to construct prediction sets with user-specified recall and coverage levels. The method is applicable to both single- and multi-source scenarios, supports general network diffusion dynamics, and is computationally efficient for large graphs. Empirical results demonstrate that our method achieves rigorous coverage with competitive accuracy, outperforming existing baselines in both reliability and scalability.The code is available online.