Efficient Estimation of Shortest-Path Distance Distributions to Samples in Graphs

📅 2025-02-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Estimating the impact of graph sampling on shortest-path distance distributions is challenging without performing actual sampling or computing all-pairs shortest paths. Method: This paper proposes an analytical evaluation framework that avoids both full-graph shortest-path computation and empirical sampling. Its core innovation is the first closed-form estimation of the shortest-path distance distribution from sampled to unsampled nodes—derived solely from the node-degree distribution—and extended to community-structured graphs via random-graph modeling and community-aware approximations. Contribution/Results: Compared to simulation-based empirical methods, our approach achieves over 10× speedup while maintaining an average error below 8% across diverse real-world and synthetic graphs. It also demonstrates high consistency in downstream bias-comparison tasks. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
As large graph datasets become increasingly common across many fields, sampling is often needed to reduce the graphs into manageable sizes. This procedure raises critical questions about representativeness as no sample can capture the properties of the original graph perfectly, and different parts of the graph are not evenly affected by the loss. Recent work has shown that the distances from the non-sampled nodes to the sampled nodes can be a quantitative indicator of bias and fairness in graph machine learning. However, to our knowledge, there is no method for evaluating how a sampling method affects the distribution of shortest-path distances without actually performing the sampling and shortest-path calculation. In this paper, we present an accurate and efficient framework for estimating the distribution of shortest-path distances to the sample, applicable to a wide range of sampling methods and graph structures. Our framework is faster than empirical methods and only requires the specification of degree distributions. We also extend our framework to handle graphs with community structures. While this introduces a decrease in accuracy, we demonstrate that our framework remains highly accurate on downstream comparison-based tasks. Code is publicly available at https://github.com/az1326/shortest_paths.
Problem

Research questions and friction points this paper is trying to address.

Estimates shortest-path distance distributions in graphs.
Evaluates sampling methods impact on graph representativeness.
Handles graphs with community structures efficiently.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Estimates shortest-path distance distributions
Requires only degree distributions
Handles graphs with community structures
🔎 Similar Papers
No similar papers found.