Network Cross-Validation and Model Selection via Subsampling

📅 2025-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional cross-validation fails on network-structured data due to its non-i.i.d. and topologically dependent nature. To address this, we propose NETCROP—the first general-purpose network cross-validation framework based on overlapping subnetwork partitioning. NETCROP constructs training sets from overlapping subnetworks and forms test sets from node pairs across distinct subnetworks, thereby preserving both theoretical rigor and computational efficiency. We establish theoretical consistency of the resulting estimator under standard network model assumptions. Empirical evaluation across diverse network models—including latent space, stochastic block, and graph neural network models—demonstrates that NETCROP achieves significantly higher predictive accuracy than state-of-the-art baselines in model selection and hyperparameter tuning. Moreover, it accelerates computation by one to two orders of magnitude and exhibits strong robustness and generalization across both synthetic and real-world networks.

Technology Category

Application Category

📝 Abstract
Complex and larger networks are becoming increasingly prevalent in scientific applications in various domains. Although a number of models and methods exist for such networks, cross-validation on networks remains challenging due to the unique structure of network data. In this paper, we propose a general cross-validation procedure called NETCROP (NETwork CRoss-Validation using Overlapping Partitions). The key idea is to divide the original network into multiple subnetworks with a shared overlap part, producing training sets consisting of the subnetworks and a test set with the node pairs between the subnetworks. This train-test split provides the basis for a network cross-validation procedure that can be applied on a wide range of model selection and parameter tuning problems for networks. The method is computationally efficient for large networks as it uses smaller subnetworks for the training step. We provide methodological details and theoretical guarantees for several model selection and parameter tuning tasks using NETCROP. Numerical results demonstrate that NETCROP performs accurate cross-validation on a diverse set of network model selection and parameter tuning problems. The results also indicate that NETCROP is computationally much faster while being often more accurate than the existing methods for network cross-validation.
Problem

Research questions and friction points this paper is trying to address.

Challenges in cross-validation for network data structure
Proposing NETCROP for efficient network model selection
Improving accuracy and speed in network parameter tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Divides network into overlapping subnetworks for validation
Uses smaller subnetworks to enhance computational efficiency
Provides accurate and faster cross-validation than existing methods
🔎 Similar Papers