PageRank Centrality in Directed Graphs with Bounded In-Degree

📅 2025-08-02

📈 Citations: 0

✨ Influential: 0

career value

237K/year

🤖 AI Summary

This paper studies the computational complexity of locally estimating the PageRank centrality of a single node in directed graphs, focusing on closing the gap between upper and lower bounds under the assumption of bounded in-degree (Δ_in). We propose a novel algorithm based on stochastic reverse propagation and Monte Carlo sampling: starting from the target node, it selectively traces edges backward along the reverse graph and performs weighted sampling, accessing only a local subgraph. Theoretically, its time complexity is O(1/ε² ⋅ log(1/δ) ⋅ (1 + αΔ_in)/(1 − α)²), matching the state-of-the-art lower bound up to logarithmic factors—achieving optimality for small Δ_in for the first time. This result fills a fundamental gap in the in-degree-parameterized complexity theory of PageRank estimation and significantly improves upon prior algorithms whose complexities depend on the total number of nodes or on out-degree.

Technology Category

Application Category

📝 Abstract

We study the computational complexity of locally estimating a node's PageRank centrality in a directed graph $G$. For any node $t$, its PageRank centrality $π(t)$ is defined as the probability that a random walk in $G$, starting from a uniformly chosen node, terminates at $t$, where each step terminates with a constant probability $αin(0,1)$. To obtain a multiplicative $ig(1pm O(1)ig)$-approximation of $π(t)$ with probability $Ω(1)$, the previously best upper bound is $O(n^{1/2}min{ Δ_{in}^{1/2},Δ_{out}^{1/2},m^{1/4}})$ from [Wang, Wei, Wen, Yang STOC '24], where $n$ and $m$ denote the number of nodes and edges in $G$, and $Δ_{in}$ and $Δ_{out}$ upper bound the in-degrees and out-degrees of $G$, respectively. The same paper implicitly gives the previously best lower bound of $Ω(n^{1/2}min{Δ_{in}^{1/2}/n^γ,Δ_{out}^{1/2}/n^γ,m^{1/4}})$, where $γ=frac{log(1/(1-α))}{4logΔ_{in}-2log(1/(1-α))}$ if $Δ_{in}>1/(1-α)$, and $γ=1/2$ if $Δ_{in}le1/(1-α)$. As $γ$ only depends on $Δ_{in}$, the known upper bound is tight if we only parameterize the complexity by $n$, $m$, and $Δ_{out}$. However, there remains a gap of $Ω(n^γ)$ when considering $Δ_{in}$, and this gap is large when $Δ_{in}$ is small. In the extreme case where $Δ_{in}le1/(1-α)$, we have $γ=1/2$, leading to a gap of $Ω(n^{1/2})$ between the bounds $O(n^{1/2})$ and $Ω(1)$. In this paper, we present a new algorithm that achieves the above lower bound (up to logarithmic factors). The algorithm assumes that $n$ and the bounds $Δ_{in}$ and $Δ_{out}$ are known in advance. Our key technique is a novel randomized backwards propagation process which only propagates selectively based on Monte Carlo estimated PageRank scores.

Problem

Research questions and friction points this paper is trying to address.

Estimating PageRank centrality in directed graphs efficiently

Closing complexity gap for bounded in-degree graphs

Developing a novel randomized backwards propagation algorithm

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel randomized backwards propagation process

Selective propagation using Monte Carlo

Tight approximation for PageRank centrality

🔎 Similar Papers

LayerPlexRank: Exploring Node Centrality and Layer Influence through Algebraic Connectivity in Multiplex Networks