🤖 AI Summary
This work addresses the scalability challenges of large-scale graph processing, which is often constrained by limited single-node memory and computational resources, while existing distributed frameworks struggle with high communication latency, irregular memory access patterns, and synchronization overhead. To overcome these limitations, the authors propose a task-driven distributed graph processing approach that integrates the NWGraph graph algorithm library with the HPX asynchronous task-based runtime system. By leveraging fine-grained asynchronous execution, the method reduces synchronization costs and improves load balancing. Experimental evaluation on representative algorithms such as BFS and PageRank demonstrates that the proposed approach outperforms the distributed Boost Graph Library in BFS performance, while indicating room for further optimization in PageRank. Overall, the results validate the potential and effectiveness of asynchronous task-based models for scalable graph analytics.
📝 Abstract
Graphs are central to modeling relationships in scientific computing, data analysis, and AI/ML, but their growing scale can exceed the memory and compute capacity of single nodes, requiring distributed solutions. Existing distributed graph framework, however, face fundamental challenges: graph algorithms are latency-bound, suffer from irregular memory access, and often impose synchronization costs that limit scalability and efficiency. In this work, we present a distributed implementation of the NWGraph library integrated with the HPX runtime system. By leveraging HPX's asynchronous many-task model, our approach aims to reduce synchronization overhead, improve load balance, and provide a foundation for distributed graph analytics. We evaluate this approach using two representative algorithms: Breadth-First-Search (BFS) and (PageRank). Our initial results show that BFS achieves better performance than the distributed Boost Graph Library (BGL), while PageRank remains more challenging, with current implementation not yet outperforming BGL. These findings highlight both the promise and the open challenges of applying asynchronous task-based runtimes to graph processing, and point to opportunities for future optimizations and extensions.