π€ AI Summary
To address the low efficiency and high false-negative rates of exact subgraph matching (i.e., query graph isomorphism detection) on large-scale graphs, this paper proposes PathDomβa false-negative-free matching framework based on path-dominance embedding. The core innovation lies in the first formal definition and learning of path embeddings that satisfy strict dominance-preserving mappings: graph isomorphism is equivalently reduced to multi-dimensional dominance relations among embedding vectors. Leveraging this reduction, we design lossless pruning strategies and cost-aware optimal query path planning. PathDom integrates GNN-driven path representation learning, multi-dimensional indexing for acceleration, parallelized graph partition traversal, and explicit dominance relation modeling. Evaluated on both real-world and synthetic datasets, PathDom achieves zero false negatives while reducing query latency by 1β3 orders of magnitude over state-of-the-art methods, significantly enhancing scalability and practicality of exact subgraph matching on large-scale graphs.
π Abstract
The classic problem of
exact subgraph matching
returns those subgraphs in a large-scale data graph that are isomorphic to a given query graph, which has gained increasing importance in many real-world applications such as social network analysis, knowledge graph discovery in the Semantic Web, bibliographical network mining, and so on. In this paper, we propose a novel and effective
graph neural network (GNN)-based path embedding framework
(GNN-PE), which allows efficient
exact subgraph matching
without introducing
false dismissals.
Unlike traditional GNN-based graph embeddings that only produce
approximate
subgraph matching results, in this paper, we carefully devise GNN-based embeddings for paths, such that: if two paths (and 1-hop neighbors of vertices on them) have the subgraph relationship, their corresponding GNN-based embedding vectors will strictly follow the dominance relationship. With such a newly designed property of path dominance embeddings, we are able to propose effective pruning strategies based on path label/dominance embeddings and guarantee no false dismissals for subgraph matching. We build multidimensional indexes over path embedding vectors, and develop an efficient subgraph matching algorithm by traversing indexes over graph partitions in parallel and applying our pruning methods. We also propose a cost-model-based query plan that obtains query paths from the query graph with low query cost. Through extensive experiments, we confirm the efficiency and effectiveness of our proposed GNN-PE approach for exact subgraph matching on both real and synthetic graph data.