๐ค AI Summary
This paper addresses hypothesis testing at the node, edge, and path levels in large-scale attributed graphs, proposing the first formal multi-granularity graph hypothesis testing framework. To overcome the limitations of conventional tabular methods on graph-structured data, we design PHASEโa path-hypothesis-aware random walk samplerโand its optimized variant PHASE-opt, jointly optimizing hypothesis-driven sampling and computational efficiency. Our approach integrates graph sampling theory, *m*-dimensional random walk modeling, and rigorous time-complexity analysis to ensure statistical validity while enhancing scalability. Experiments on three real-world attributed graph datasets demonstrate that, compared to generic sampling baselines, our framework improves testing accuracy by 12.7% and accelerates runtime by 3.8ร, significantly strengthening statistical inference capabilities for large-scale attributed graphs.
๐ Abstract
Hypothesis testing is a statistical method used to draw conclusions about populations from sample data, typically represented in tables. With the prevalence of graph representations in real-life applications, hypothesis testing on graphs is gaining importance. In this work, we formalize node, edge, and path hypotheses on attributed graphs. We develop a sampling-based hypothesis testing framework, which can accommodate existing hypothesis-agnostic graph sampling methods. To achieve accurate and time-efficient sampling, we then propose a Path-Hypothesis-Aware SamplEr, PHASE, an
m
-dimensional random walk that accounts for the paths specified in the hypothesis. We further optimize its time efficiency and propose PHASE
opt
. Experiments on three real datasets demonstrate the ability of our framework to leverage common graph sampling methods for hypothesis testing, and the superiority of hypothesis-aware sampling methods in terms of accuracy and time efficiency.