Scalable and Provable Kemeny Constant Computation on Static and Dynamic Graphs: A 2-Forest Sampling Approach

📅 2025-11-20

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Computing the Kemeny constant—the expected hitting time of a random walk—efficiently and accurately on large-scale static and dynamic graphs remains challenging. This paper proposes the first unbiased estimation framework for the Kemeny constant based on 2-forest sampling. By introducing a path-mapping technique that establishes a structural correspondence between spanning trees and 2-forests, we reformulate Kemeny constant estimation as a combinatorial sampling problem amenable to efficient implementation. We provide theoretical guarantees showing our estimator’s error bound strictly improves upon existing random-walk-based methods. Furthermore, we design a dynamic sampling maintenance mechanism supporting edge insertions and deletions, leveraging Binary Indexed Trees and near-linear-time algorithms for efficient updates. Extensive experiments on ten real-world large-scale datasets demonstrate that our method significantly outperforms state-of-the-art approaches in both accuracy and runtime, while offering strong theoretical guarantees and practical scalability.

Technology Category

Application Category

📝 Abstract

Kemeny constant, defined as the expected hitting time of random walks from a source node to a randomly chosen target node, is a fundamental metric in graph data management with many real-world applications. However, computing it exactly on large graphs is highly challenging, as it requires inverting large graph matrices. Existing solutions mainly rely on approximate random-walk-based methods, which still need large sample sizes and lack strong theoretical guarantees. In this paper, we propose a new approach for approximating the Kemeny constant via 2-forest sampling. We first derive an unbiased estimator expressed through spanning trees by introducing a path mapping technique that establishes a direct correspondence between spanning trees and certain classes of 2-forests. Compared to random walk-based estimators, 2-forest-based estimators yield leads to a better theoretical bound. We further design efficient algorithms to sample and traverse spanning trees, leveraging data structures such as the Binary Indexed Tree (BIT) for optimization. Our theoretical analysis shows that the Kemeny constant can be approximated with relative error $ε$ in $Oleft(frac{Δ^2ar{d}^2}{ε^2}(τ+ nmin(log n, Δ)) ight)$ time, where $τ$ is the tree-sampling time, $ar{d}$ is the average degree, and $Δ$ is the graph diameter. This complexity is near-linear in practice. Moreover, existing methods largely target static graphs and lack efficient mechanisms for dynamic updates. To address this, we propose two sample maintenance strategies that partially update samples while preserving accuracy on dynamic graphs. Extensive experiments on 10 large real-world datasets demonstrate that our method consistently outperforms state-of-the-art approaches in both efficiency and accuracy on static and dynamic graphs.

Problem

Research questions and friction points this paper is trying to address.

Computing Kemeny constant exactly on large graphs is challenging due to matrix inversion requirements

Existing approximation methods need large sample sizes and lack strong theoretical guarantees

Current approaches mainly target static graphs without efficient dynamic update mechanisms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 2-forest sampling for Kemeny constant estimation

Leverages Binary Indexed Trees for spanning tree optimization

Maintains accuracy with partial updates on dynamic graphs

🔎 Similar Papers

Enumerating Graphlets with Amortized Time Complexity Independent of Graph Size

2024-05-22AlgorithmicaCitations: 0

💼 Related Jobs

Performance Engineer

Anthropic

$280,000—$850,000 USD

San Francisco, CA, USA

Research Scientist