BD-Index: Scalable Biharmonic Distance Queries on Large Graphs via Divide-and-Conquer Indexing

📅 2025-12-02

📈 Citations: 0

✨ Influential: 0

career value

229K/year

🤖 AI Summary

To address the inefficiency of pairwise biharmonic distance queries on large-scale graphs, this paper proposes BD-Index, a novel indexing structure. Methodologically, it is the first to interpret biharmonic distance as the Wasserstein distance between random walk distributions of two nodes and leverages graph cutsets to decompose global walks into locally independent processes. Based on this insight, BD-Index introduces a tree-based hierarchical partitioning scheme that enables bottom-up, deterministic probabilistic propagation and distance computation. The index achieves space complexity O(nh), construction time O(nh(h + dₘₐₓ)), and query time O(h), where n is the number of vertices, h the height of the hierarchy, and dₘₐₓ the maximum degree—marking a significant speedup over state-of-the-art methods. Experiments demonstrate millisecond-level exact biharmonic distance queries on million-node graphs.

Technology Category

Application Category

📝 Abstract

Biharmonic distance (d) is a powerful graph distance metric with many applications, including identifying critical links in road networks and mitigating over-squashing problem in gnn. However, computing d is extremely difficult, especially on large graphs. In this paper, we focus on the problem of emph{single-pair} d query. Existing methods mainly rely on random walk-based approaches, which work well on some graphs but become inefficient when the random walk cannot mix rapidly.To overcome this issue, we first show that the biharmonic distance between two nodes $s,t$, denoted by $b(s,t)$, can be interpreted as the distance between two random walk distributions starting from $s$ and $t$. To estimate these distributions, the required random walk length is large when the underlying graph can be easily cut into smaller pieces. Inspired by this observation, we present novel formulas of d to represent $b(s,t)$ by independent random walks within two node sets $mathcal{V}_s$, $mathcal{V}_t$ separated by a small emph{cut set} $mathcal{V}_{cut}$, where $mathcal{V}_scupmathcal{V}_tcupmathcal{V}_{cut}=mathcal{V}$ is the set of graph nodes. Building upon this idea, we propose index, a novel index structure which follows a divide-and-conquer strategy. The graph is first cut into pieces so that each part can be processed easily. Then, all the required random walk probabilities can be deterministically computed in a bottom-top manner. When a query comes, only a small part of the index needs to be accessed. We prove that index requires $O(ncdot h)$ space, can be built in $O(ncdot hcdot (h+d_{max}))$ time, and answers each query in $O(ncdot h)$ time, where $h$ is the height of a hierarchy partition tree and $d_{max}$ is the maximum degree, which are both usually much smaller than $n$.

Problem

Research questions and friction points this paper is trying to address.

Computes biharmonic distance queries efficiently on large graphs.

Overcomes inefficiency of random walk methods on easily separable graphs.

Uses divide-and-conquer indexing to reduce query time and space.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Divide-and-conquer indexing for biharmonic distance queries

Random walk distributions separated by small cut sets

Deterministic bottom-top computation with hierarchical partition tree

🔎 Similar Papers

A Universal Scheme for Dynamic Partitioned Shortest Path Index: Survey, Improvement, and Experiments