π€ AI Summary
The maximal biclique enumeration (MBE) problem on large-scale graphs suffers from high computational complexity and poor scalability. To address this, we propose cuMBEβthe first efficient GPU-parallel algorithm for MBE. Our method eliminates recursion entirely via a compact array-based data structure and synergistically combines coarse-grained task partitioning, fine-grained intra-thread optimizations, and dynamic work-stealing to mitigate load imbalance and memory overhead bottlenecks. By leveraging GPU parallelism and thread-block-level cooperative optimization, cuMBE achieves geometric mean speedups of 4.02Γ and 4.13Γ over the best sequential and multicore CPU algorithms, respectively, on both synthetic and real-world datasets. This advancement significantly enhances the scalability and practical applicability of MBE computation.
π Abstract
Maximal Biclique Enumeration (MBE) holds critical importance in graph theory with applications extending across fields such as bioinformatics, social networks, and recommendation systems. However, its computational complexity presents barriers for efficiently scaling to large graphs. To address these challenges, we introduce cuMBE, a GPU-optimized parallel algorithm for MBE. Utilizing a unique data structure, called compact array, cuMBE eradicates the need for recursion, thereby significantly minimizing dynamic memory requirements and computational overhead. The algorithm utilizes a hybrid parallelism approach, in which GPU thread blocks handle coarse-grained tasks associated with part of the search process. Besides, we implement three fine-grained optimizations within each thread block to enhance performance. Further, we integrate a work-stealing mechanism to mitigate workload imbalances among thread blocks. Our experiments reveal that cuMBE achieves an geometric mean speedup of 4.02x and 4.13x compared to the state-of-the-art serial algorithm and parallel CPU-based algorithm on both common and real-world datasets, respectively.