🤖 AI Summary
This paper addresses dense subgraph mining in networks with strong/weak edge annotations, proposing a novel paradigm integrating Strong Triadic Closure (STC) constraints. To jointly capture subgraph density and relational strength structure, we define a weighted density function balancing the count of strong edges and overall connectivity, and jointly optimize subgraph selection and edge-type labeling. We establish the first unified framework for dense subgraph discovery and STC modeling, prove its NP-hardness, and characterize theoretical boundaries: at λ=1 it reduces to the maximum-density subgraph problem, and at λ=0 to the maximum clique problem. We develop an exact integer linear programming (ILP) formulation and four efficient heuristics—greedy iteration, local search, edge reweighting, and STC-aware pruning. Experiments demonstrate precise recovery of ground-truth dense modules on synthetic data and significant superiority over conventional density models on large-scale real-world networks.
📝 Abstract
Finding dense subgraphs is a core problem with numerous graph mining applications such as community detection in social networks and anomaly detection. However, in many real-world networks connections are not equal. One way to label edges as either strong or weak is to use strong triadic closure~(STC). Here, if one node connects strongly with two other nodes, then those two nodes should be connected at least with a weak edge. STC-labelings are not unique and finding the maximum number of strong edges is NP-hard. In this paper, we apply STC to dense subgraph discovery. More formally, our score for a given subgraph is the ratio between the sum of the number of strong edges and weak edges, weighted by a user parameter $lambda$, and the number of nodes of the subgraph. Our goal is to find a subgraph and an STC-labeling maximizing the score. We show that for $lambda = 1$, our problem is equivalent to finding the densest subgraph, while for $lambda = 0$, our problem is equivalent to finding the largest clique, making our problem NP-hard. We propose an exact algorithm based on integer linear programming and four practical polynomial-time heuristics. We present an extensive experimental study that shows that our algorithms can find the ground truth in synthetic datasets and run efficiently in real-world datasets.