Dirichlet Meets Horvitz and Thompson: Estimating Homophily in Large Networks via Sampling

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Assessing homophily in large-scale dynamic networks typically requires full-graph topology and node features, making it impractical under resource and privacy constraints. Method: We propose the first sampling-based, unbiased homophily estimation framework. Our approach uniquely integrates the Dirichlet energy of graph signals—which quantifies feature smoothness—with the Horvitz–Thompson (HT) unbiased estimator, enabling flexible design of analytically tractable edge-sampling probabilities. We establish theoretical consistency and extend the framework naturally to heterophily measurement. Results: Extensive experiments on multiple benchmark datasets demonstrate that the HT estimator reliably and accurately captures structural homophily/heterophily using only a small fraction of sampled edges. It significantly reduces dependence on global information while maintaining robustness and precision. This yields a scalable, privacy-aware, and resource-efficient tool for graph analysis and principled GNN model selection in constrained environments.

Technology Category

Application Category

📝 Abstract
Assessing homophily in large-scale networks is central to understanding structural regularities in graphs, and thus inform the choice of models (such as graph neural networks) adopted to learn from network data. Evaluation of smoothness metrics requires access to the entire network topology and node features, which may be impractical in several large-scale, dynamic, resource-limited, or privacy-constrained settings. In this work, we propose a sampling-based framework to estimate homophily via the Dirichlet energy (Laplacian-based total variation) of graph signals, leveraging the Horvitz-Thompson (HT) estimator for unbiased inference from partial graph observations. The Dirichlet energy is a so-termed total (of squared nodal feature deviations) over graph edges; hence, estimable under general network sampling designs for which edge-inclusion probabilities can be analytically derived and used as weights in the proposed HT estimator. We establish that the Dirichlet energy can be consistently estimated from sampled graphs, and empirically study other heterophily measures as well. Experiments on several heterophilic benchmark datasets demonstrate the effectiveness of the proposed HT estimators in reliably capturing homophilic structure (or lack thereof) from sampled network measurements.
Problem

Research questions and friction points this paper is trying to address.

Estimates homophily in large networks via sampling
Addresses impracticality of full network topology access
Uses Dirichlet energy and HT estimator for unbiased inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sampling-based framework estimates homophily via Dirichlet energy
Uses Horvitz-Thompson estimator for unbiased inference from partial graphs
Derives edge-inclusion probabilities as weights for consistent estimation
🔎 Similar Papers