🤖 AI Summary
Approximate subgraph matching (ASM) is an NP-hard problem that determines whether a query graph approximately exists within a large-scale target graph. This work proposes a novel approach based on a branch-and-bound framework, uniquely integrating graph Transformers with reinforcement learning. The method leverages graph Transformers to capture global structural information and employs imitation learning for pretraining followed by Proximal Policy Optimization (PPO) fine-tuning to refine the matching strategy, aiming to maximize long-term matching rewards. Evaluated on both synthetic and real-world datasets, the proposed approach significantly outperforms existing state-of-the-art methods, achieving leading performance in both matching accuracy and computational efficiency.
📝 Abstract
Approximate subgraph matching (ASM) is a task that determines the approximate presence of a given query graph in a large target graph. Being an NP-hard problem, ASM is critical in graph analysis with a myriad of applications ranging from database systems and network science to biochemistry and privacy. Existing techniques often employ heuristic search strategies, which cannot fully utilize the graph information, leading to sub-optimal solutions. This paper proposes a Reinforcement Learning based Approximate Subgraph Matching (RL-ASM) algorithm that exploits graph transformers to effectively extract graph representations and RL-based policies for ASM. Our model is built upon the branch-and-bound algorithm that selects one pair of nodes from the two input graphs at a time for potential matches. Instead of using heuristics, we exploit a Graph Transformer architecture to extract feature representations that encode the full graph information. To enhance the training of the RL policy, we use supervised signals to guide our agent in an imitation learning stage. Subsequently, the policy is fine-tuned with the Proximal Policy Optimization (PPO) that optimizes the accumulative long-term rewards over episodes. Extensive experiments on both synthetic and real-world datasets demonstrate that our RL-ASM outperforms existing methods in terms of effectiveness and efficiency. Our source code is available at https://github.com/KaiyangLi1992/RL-ASM.