🤖 AI Summary
This work addresses the NP-hard problem of frequent induced subgraph mining, which is computationally expensive for traditional enumeration methods and relies on the downward-closure property. The authors propose the first formulation of this task as a Markov decision process and introduce an efficient search framework based on multi-task reinforcement learning. Their approach employs a task-aware graph neural network to guide subgraph exploration, thereby circumventing the limitations of conventional methods. Notably, the algorithm achieves time complexity linear in the subgraph size \(k\). Experimental results on real-world datasets demonstrate that the proposed method accurately approximates the true most frequent \(k\)-subgraphs, significantly outperforms baseline approaches in runtime, and exhibits superior stability.
📝 Abstract
Identifying the most frequent induced subgraph of size $k$ in a target graph is a fundamental graph mining problem with direct implications for Web-related data mining and social network analysis. Despite its importance, finding the most frequent induced subgraph remains computationally expensive due to the NP-hard nature of the subgraph counting task. Traditional exact enumeration algorithms often suffer from high time complexity, especially for a large graph size $k$. To mitigate this, existing approaches often utilize frequency measurement with the Downward Closure Property to reduce the search space, imposing additional constraints on the task. In this paper, we first formulate this task as a Markov Decision Process and approach it using a multi-task reinforcement learning framework. Specifically, we introduce RLMiner, a novel framework that integrates reinforcement learning with our proposed task-state-aware Graph Neural Network to find the most frequent induced subgraph of size $k$ with a time complexity linear to $k$. Extensive experiments on real-world datasets demonstrate that our proposed RLMiner effectively identifies subgraphs with frequencies closely matching the ground-truth most frequent induced subgraphs, while achieving significantly shorter and more stable running times compared to traditional methods.