🤖 AI Summary
This study addresses the problem of identifying the $k$ most critical contributor nodes in bipartite dependency networks—those whose removal isolates the largest number of items—and formally defines it as the CriticalSet problem, proving its NP-hardness and the supermodularity of its objective function. Drawing on cooperative game theory, the authors introduce ShapleyCov, a Shapley value-based centrality measure with a closed-form solution, and propose MinCov, a linear-time iterative peeling algorithm that prioritizes nodes providing unique support to items. Extensive experiments on real-world and synthetic datasets, including a Wikipedia graph with 250 million edges, demonstrate that MinCov achieves accuracy nearly on par with a stochastic hill-climbing heuristic (with an AUC gap of only 0.02) while offering orders-of-magnitude speedups and substantially outperforming conventional baselines.
📝 Abstract
Identifying critical nodes in complex networks is a fundamental task in graph mining. Yet, methods addressing an all-or-nothing coverage mechanics in a bipartite dependency network, a graph with two types of nodes where edges represent dependency relationships across the two groups only, remain largely unexplored. We formalize the CriticalSet problem: given an arbitrary bipartite graph modeling dependencies of items on contributors, identify the set of k contributors whose removal isolates the largest number of items. We prove that this problem is NP-hard and requires maximizing a supermodular set function, for which standard forward greedy algorithms provide no approximation guarantees. Consequently, we model CriticalSet as a coalitional game, deriving a closed-form centrality, ShapleyCov, based on the Shapley value. This measure can be interpreted as the expected number of items isolated by a contributor's departure. Leveraging these insights, we propose MinCov, a linear-time iterative peeling algorithm that explicitly accounts for connection redundancy, prioritizing contributors who uniquely support many items. Extensive experiments on synthetic and large-scale real datasets, including a Wikipedia graph with over 250 million edges, reveal that MinCov and ShapleyCov significantly outperform traditional baselines. Notably, MinCov achieves near-optimal performance, within 0.02 AUC of a Stochastic Hill Climbing metaheuristic, while remaining several orders of magnitude faster.