🤖 AI Summary
This study addresses the computational and storage challenges posed by large-scale network data, which often exceed the capacity of existing statistical inference methods to balance efficiency and rigor. The authors propose a scalable subsampling inference framework under the generalized random dot product graph (GRDPG) model. By randomly selecting a small subset of nodes to construct a subgraph and leveraging graph interpolation techniques that exploit connections between the subgraph and the full graph, the method estimates global network features. This approach uniquely integrates predictive subsampling with an interpolation mechanism, achieving statistical consistency of estimators while substantially reducing computational complexity. Theoretical analysis establishes the consistency of the proposed estimator, and simulation studies demonstrate its high statistical power and computational efficiency in two-sample hypothesis testing.
📝 Abstract
Network datasets appear across a wide range of scientific fields, including biology, physics, and the social sciences. To enable data-driven discoveries from these networks, statistical inference techniques like estimation and hypothesis testing are crucial. However, the size of modern networks often exceeds the storage and computational capacities of existing methods, making timely, statistically rigorous inference difficult. In this work, we introduce a subsampling-based approach aimed at reducing the computational burden associated with estimation and two-sample hypothesis testing. Our strategy involves selecting a small random subset of nodes from the network, conducting inference on the resulting subgraph, and then using interpolation based on the observed connections between the subsample and the rest of the nodes to estimate the entire graph. We develop the methodology under the generalized random dot product graph framework, which affords broad applicability and permits rigorous analysis. Within this setting, we establish consistency guarantees and corroborate the practical effectiveness of the approach through comprehensive simulation studies.