Scalable and Provable Kemeny Constant Computation on Static and Dynamic Graphs: A 2-Forest Sampling Approach

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Computing the Kemeny constant—the expected hitting time of a random walk—efficiently and accurately on large-scale static and dynamic graphs remains challenging. This paper proposes the first unbiased estimation framework for the Kemeny constant based on 2-forest sampling. By introducing a path-mapping technique that establishes a structural correspondence between spanning trees and 2-forests, we reformulate Kemeny constant estimation as a combinatorial sampling problem amenable to efficient implementation. We provide theoretical guarantees showing our estimator’s error bound strictly improves upon existing random-walk-based methods. Furthermore, we design a dynamic sampling maintenance mechanism supporting edge insertions and deletions, leveraging Binary Indexed Trees and near-linear-time algorithms for efficient updates. Extensive experiments on ten real-world large-scale datasets demonstrate that our method significantly outperforms state-of-the-art approaches in both accuracy and runtime, while offering strong theoretical guarantees and practical scalability.

Technology Category

Application Category

📝 Abstract
Kemeny constant, defined as the expected hitting time of random walks from a source node to a randomly chosen target node, is a fundamental metric in graph data management with many real-world applications. However, computing it exactly on large graphs is highly challenging, as it requires inverting large graph matrices. Existing solutions mainly rely on approximate random-walk-based methods, which still need large sample sizes and lack strong theoretical guarantees. In this paper, we propose a new approach for approximating the Kemeny constant via 2-forest sampling. We first derive an unbiased estimator expressed through spanning trees by introducing a path mapping technique that establishes a direct correspondence between spanning trees and certain classes of 2-forests. Compared to random walk-based estimators, 2-forest-based estimators yield leads to a better theoretical bound. We further design efficient algorithms to sample and traverse spanning trees, leveraging data structures such as the Binary Indexed Tree (BIT) for optimization. Our theoretical analysis shows that the Kemeny constant can be approximated with relative error $ε$ in $Oleft(frac{Δ^2ar{d}^2}{ε^2}(τ+ nmin(log n, Δ)) ight)$ time, where $τ$ is the tree-sampling time, $ar{d}$ is the average degree, and $Δ$ is the graph diameter. This complexity is near-linear in practice. Moreover, existing methods largely target static graphs and lack efficient mechanisms for dynamic updates. To address this, we propose two sample maintenance strategies that partially update samples while preserving accuracy on dynamic graphs. Extensive experiments on 10 large real-world datasets demonstrate that our method consistently outperforms state-of-the-art approaches in both efficiency and accuracy on static and dynamic graphs.
Problem

Research questions and friction points this paper is trying to address.

Computing Kemeny constant exactly on large graphs is challenging due to matrix inversion requirements
Existing approximation methods need large sample sizes and lack strong theoretical guarantees
Current approaches mainly target static graphs without efficient dynamic update mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 2-forest sampling for Kemeny constant estimation
Leverages Binary Indexed Trees for spanning tree optimization
Maintains accuracy with partial updates on dynamic graphs
🔎 Similar Papers
No similar papers found.