🤖 AI Summary
Assessing homophily in large-scale dynamic networks typically requires full-graph topology and node features, making it impractical under resource and privacy constraints.
Method: We propose the first sampling-based, unbiased homophily estimation framework. Our approach uniquely integrates the Dirichlet energy of graph signals—which quantifies feature smoothness—with the Horvitz–Thompson (HT) unbiased estimator, enabling flexible design of analytically tractable edge-sampling probabilities. We establish theoretical consistency and extend the framework naturally to heterophily measurement.
Results: Extensive experiments on multiple benchmark datasets demonstrate that the HT estimator reliably and accurately captures structural homophily/heterophily using only a small fraction of sampled edges. It significantly reduces dependence on global information while maintaining robustness and precision. This yields a scalable, privacy-aware, and resource-efficient tool for graph analysis and principled GNN model selection in constrained environments.
📝 Abstract
Assessing homophily in large-scale networks is central to understanding structural regularities in graphs, and thus inform the choice of models (such as graph neural networks) adopted to learn from network data. Evaluation of smoothness metrics requires access to the entire network topology and node features, which may be impractical in several large-scale, dynamic, resource-limited, or privacy-constrained settings. In this work, we propose a sampling-based framework to estimate homophily via the Dirichlet energy (Laplacian-based total variation) of graph signals, leveraging the Horvitz-Thompson (HT) estimator for unbiased inference from partial graph observations. The Dirichlet energy is a so-termed total (of squared nodal feature deviations) over graph edges; hence, estimable under general network sampling designs for which edge-inclusion probabilities can be analytically derived and used as weights in the proposed HT estimator. We establish that the Dirichlet energy can be consistently estimated from sampled graphs, and empirically study other heterophily measures as well. Experiments on several heterophilic benchmark datasets demonstrate the effectiveness of the proposed HT estimators in reliably capturing homophilic structure (or lack thereof) from sampled network measurements.