Efficient Exact Subgraph Matching via GNN-based Path Dominance Embedding

πŸ“… 2023-09-27
πŸ›οΈ Proceedings of the VLDB Endowment
πŸ“ˆ Citations: 21
✨ Influential: 1
πŸ“„ PDF
πŸ€– AI Summary
To address the low efficiency and high false-negative rates of exact subgraph matching (i.e., query graph isomorphism detection) on large-scale graphs, this paper proposes PathDomβ€”a false-negative-free matching framework based on path-dominance embedding. The core innovation lies in the first formal definition and learning of path embeddings that satisfy strict dominance-preserving mappings: graph isomorphism is equivalently reduced to multi-dimensional dominance relations among embedding vectors. Leveraging this reduction, we design lossless pruning strategies and cost-aware optimal query path planning. PathDom integrates GNN-driven path representation learning, multi-dimensional indexing for acceleration, parallelized graph partition traversal, and explicit dominance relation modeling. Evaluated on both real-world and synthetic datasets, PathDom achieves zero false negatives while reducing query latency by 1–3 orders of magnitude over state-of-the-art methods, significantly enhancing scalability and practicality of exact subgraph matching on large-scale graphs.
πŸ“ Abstract
The classic problem of exact subgraph matching returns those subgraphs in a large-scale data graph that are isomorphic to a given query graph, which has gained increasing importance in many real-world applications such as social network analysis, knowledge graph discovery in the Semantic Web, bibliographical network mining, and so on. In this paper, we propose a novel and effective graph neural network (GNN)-based path embedding framework (GNN-PE), which allows efficient exact subgraph matching without introducing false dismissals. Unlike traditional GNN-based graph embeddings that only produce approximate subgraph matching results, in this paper, we carefully devise GNN-based embeddings for paths, such that: if two paths (and 1-hop neighbors of vertices on them) have the subgraph relationship, their corresponding GNN-based embedding vectors will strictly follow the dominance relationship. With such a newly designed property of path dominance embeddings, we are able to propose effective pruning strategies based on path label/dominance embeddings and guarantee no false dismissals for subgraph matching. We build multidimensional indexes over path embedding vectors, and develop an efficient subgraph matching algorithm by traversing indexes over graph partitions in parallel and applying our pruning methods. We also propose a cost-model-based query plan that obtains query paths from the query graph with low query cost. Through extensive experiments, we confirm the efficiency and effectiveness of our proposed GNN-PE approach for exact subgraph matching on both real and synthetic graph data.
Problem

Research questions and friction points this paper is trying to address.

Efficient exact subgraph matching in large-scale graphs
Eliminating false dismissals with GNN-based path embeddings
Optimizing query processing via path dominance embedding properties
Innovation

Methods, ideas, or system contributions that make the work stand out.

GNN-based path embeddings ensure exact subgraph matching
Multidimensional indexes enable parallel traversal of graph partitions
Path group embeddings with pruning reduce search space
πŸ”Ž Similar Papers
No similar papers found.