🤖 AI Summary
Existing GPU-based approaches for the Minimum/Bounded Vertex Cover problem suffer from redundant computation and load imbalance due to their inability to dynamically identify independent connected components arising from graph splitting, while high memory overhead severely limits concurrency.
Method: This work presents the first load-balanced parallelization of non-tail-recursive branching on GPUs, introducing a component-aware branching mechanism and a descendant-node post-processing strategy to eliminate duplicate subproblem solving. It integrates connected-component detection, graph reduction, induced-subgraph construction, and optimized work-queue management.
Results: Experiments demonstrate over 2000× speedup versus the state-of-the-art GPU method; solution time for complex graphs drops from >6 hours to several seconds. Memory consumption is significantly reduced, enabling substantially higher worker concurrency.
📝 Abstract
Algorithms for finding minimum or bounded vertex covers in graphs use a branch-and-reduce strategy, which involves exploring a highly imbalanced search tree. Prior GPU solutions assign different thread blocks to different sub-trees, while using a shared worklist to balance the load. However, these prior solutions do not scale to large and complex graphs because their unawareness of when the graph splits into components causes them to solve these components redundantly. Moreover, their high memory footprint limits the number of workers that can execute concurrently. We propose a novel GPU solution for vertex cover problems that detects when a graph splits into components and branches on the components independently. Although the need to aggregate the solutions of different components introduces non-tail-recursive branches which interfere with load balancing, we overcome this challenge by delegating the post-processing to the last descendant of each branch. We also reduce the memory footprint by reducing the graph and inducing a subgraph before exploring the search tree. Our solution substantially outperforms the state-of-the-art GPU solution, finishing in seconds when the state-of-the-art solution exceeds 6 hours. To the best of our knowledge, our work is the first to parallelize non-tail-recursive branching patterns on GPUs in a load balanced manner.