🤖 AI Summary
Parallel breadth-first search (BFS) on multicore systems suffers from irregular memory access patterns, load imbalance, and synchronization overhead. This paper proposes a hardware- and graph-aware parallel BFS method that jointly optimizes for both architectural features and graph structural properties. We introduce a novel non-atomic distance update mechanism, integrated with a hybrid traversal strategy and a compact bitmapped visited set, enabling adaptive optimization to both graph diameter and CPU microarchitecture (Intel Xeon/AMD EPYC). Experimental results demonstrate 3–10× speedup on small-diameter graphs. Moreover, we systematically characterize the strong sensitivity of optimization efficacy to both graph structure and hardware characteristics, and formally establish performance trade-off boundaries for large-diameter graphs. Our work establishes a new paradigm for co-designing graph algorithms with underlying hardware and input graph topology.
📝 Abstract
Breadth-first search (BFS) is a fundamental graph algorithm that presents significant challenges for parallel implementation due to irregular memory access patterns, load imbalance and synchronization overhead. In this paper, we introduce a set of optimization strategies for parallel BFS on multicore systems, including hybrid traversal, bitmap-based visited set, and a novel non-atomic distance update mechanism. We evaluate these optimizations across two different architectures - a 24-core Intel Xeon platform and a 128-core AMD EPYC system - using a diverse set of synthetic and real-world graphs. Our results demonstrate that the effectiveness of optimizations varies significantly based on graph characteristics and hardware architecture. For small-diameter graphs, our hybrid BFS implementation achieves speedups of 3-8x on the Intel platform and $3-10 imes$ on the AMD system compared to a conventional parallel BFS implementation. However, the performance of large-diameter graphs is more nuanced, with some of the optimizations showing varied performance across platforms including performance degradation in some cases.