🤖 AI Summary
Subgraph matching is an NP-hard problem, and existing approaches suffer from inefficiency due to extensive redundant computations. This work proposes the CEMR algorithm, which integrates a redundancy-aware expansion elimination mechanism within a depth-first search framework to significantly enhance matching efficiency. The core innovations include a strategy for merging and reusing common extensions, combined with black-white vertex encoding and a common extension buffer to minimize repeated calculations. Additionally, two effective pruning strategies are introduced to eliminate invalid search branches early. Experimental results demonstrate that CEMR substantially outperforms state-of-the-art subgraph matching algorithms across various real-world graph datasets and query workloads.
📝 Abstract
Subgraph matching is a fundamental problem in graph analysis with a wide range of applications. However, due to its inherent NP-hardness, enumerating subgraph matches efficiently on large real-world graphs remains highly challenging. Most existing works adopt a depth-first search (DFS) backtracking strategy, where a partial embedding is gradually extended in a DFS manner along a branch of the search trees until either a full embedding is found or no further extension is possible. A major limitation of this paradigm is the significant amount of duplicate computation that occurs during enumeration, which increases the overall runtime. To overcome this limitation, we propose a novel subgraph matching algorithm, CEMR. It incorporates two techniques to reduce duplicate extensions: common extension merging, which leverages a black-white vertex encoding, and common extension reusing, which employs common extension buffers. In addition, we design two pruning techniques to discard unpromising search branches. Extensive experiments on real-world datasets and diverse query workloads demonstrate that CEMR outperforms state-of-the-art subgraph matching methods.