π€ AI Summary
This paper addresses the resource-constrained community search problem in heterogeneous information networks (HINs), introducing a novel task: *size-constrained cohesive community discovery*βfinding the most structurally cohesive subgraph containing a given query node and satisfying a user-specified size constraint. To capture cohesion in HINs, we generalize the *(k, P)-truss* model to incorporate cross-type edge constraints. We propose an exact branch-and-bound (B&B) algorithm integrating structural-aware pruning, joint node-edge enumeration, heuristic initialization, and global lower-bound optimization. Additionally, we design an efficient heuristic algorithm to enhance scalability. Extensive experiments on multiple real-world HIN datasets demonstrate that our methods significantly outperform state-of-the-art baselines in both solution quality and efficiency.
π Abstract
The goal of community search in heterogeneous information networks (HINs) is to identify a set of closely related target nodes that includes a query target node. In practice, a size constraint is often imposed due to limited resources, which has been overlooked by most existing HIN community search works. In this paper, we introduce the size-bounded community search problem to HIN data. Specifically, we propose a refined (k, P)-truss model to measure community cohesiveness, aiming to identify the most cohesive community of size s that contains the query node. We prove that this problem is NP-hard. To solve this problem, we develop a novel B&B framework that efficiently generates target node sets of size s. We then tailor novel bounding, branching, total ordering, and candidate reduction optimisations, which enable the framework to efficiently lead to an optimum result. We also design a heuristic algorithm leveraging structural properties of HINs to efficiently obtain a high-quality initial solution, which serves as a global lower bound to further enhance the above optimisations. Building upon these, we propose two exact algorithms that enumerate combinations of edges and nodes, respectively. Extensive experiments on real-world datasets demonstrate the effectiveness and efficiency of the proposed methods.