🤖 AI Summary
This study addresses the computational bottleneck in counting balanced butterflies—i.e., balanced (2,2)-bicliques—in large-scale signed bipartite graphs, a critical task in higher-order structural analysis. To overcome the high computational cost of existing serial approaches, this work proposes the first efficient parallel algorithms tailored for multi-core CPUs and GPUs, namely M-BBC, G-BBC, and G-BBC++. These methods leverage fine-grained vertex-level parallelism, block-based GPU shared memory optimization, and dynamic scheduling to eliminate the generation of unbalanced substructures and mitigate load imbalance. Experimental evaluation on 15 real-world datasets demonstrates that M-BBC achieves an average speedup of 38.13× over state-of-the-art serial methods, while the GPU-based variants deliver average speedups of up to 2,600×, with a peak acceleration of 13,320×, substantially outperforming existing solutions.
📝 Abstract
Balanced butterfly counting, corresponding to counting balanced (2, 2)-bicliques, is a fundamental primitive in the analysis of signed bipartite graphs and provides a basis for studying higher-order structural properties such as clustering coefficients and community structure. Although prior work has proposed an efficient CPU-based serial method for counting balanced (2, k)-bicliques. The computational cost of balanced butterfly counting remains a major bottleneck on large-scale graphs. In this work, we present the highly parallel implementations for balanced butterfly counting for both multicore CPUs and GPUs. The proposed multi-core algorithm (M-BBC) employs fine-grained vertex-level parallelism to accelerate wedge-based counting while eliminating the generation of unbalanced substructures. To improve scalability, we develop a GPU-based method (G-BBC) that uses a tile-based parallel approach to effectively leverage shared memory while handling large vertex sets. We then present an improved variation, G-BBC++, which integrates dynamic scheduling to mitigate workload imbalance and maximize throughput. We conduct an experimental assessment of the proposed methods across 15 real-world datasets. Experimental results exhibit that M-BBC achieves speedups of up to 71.13x (average 38.13x) over the sequential baseline BB2K. The GPU-based algorithms deliver even greater improvements, achieving up to 13,320x speedup (average 2,600x) over BB2K and outperforming M-BBC by up to 186x (average 50x). These results indicate the substantial scalability and efficiency of our parallel algorithms and establish a robust foundation for high-performance signed motif analysis on massive bipartite graphs.