π€ AI Summary
Large-scale graph data often exceeds main-memory capacity, while existing out-of-core graph processing systems suffer from inefficient I/O (i.e., high read and work amplification) and severe synchronization stalls due to rigidly synchronized iterations, leading to underutilized SSDs. To address these challenges, this paper proposes AsyncGraphβa novel asynchronous out-of-core graph processing framework designed for SSDs. Its core contributions are: (1) a workload-aware dynamic block-level priority scheduler coupled with an online asynchronous worklist, significantly reducing redundant disk accesses; and (2) deep pipelining of computation and asynchronous I/O, enhanced by a hybrid storage format (optimized for low-degree vertex access) and an active-block in-memory reuse mechanism, thereby sustaining high SSD throughput. Evaluated on BFS, WCC, and PageRank, AsyncGraph achieves an average 2.3Γ speedup and 41% higher I/O efficiency over state-of-the-art systems.
π Abstract
Graphs are a ubiquitous data structure in diverse domains such as machine learning, social networks, and data mining. As real-world graphs continue to grow beyond the memory capacity of single machines, out-of-core graph processing systems have emerged as a viable solution. Yet, existing systems that rely on strictly synchronous, iteration-by-iteration execution incur significant overheads. In particular, their scheduling mechanisms lead to I/O inefficiencies, stemming from read and work amplification, and induce costly synchronization stalls hindering sustained disk utilization. To overcome these limitations, we present {em ACGraph}, a novel asynchronous graph processing system optimized for SSD-based environments with constrained memory resources. ACGraph employs a dynamic, block-centric priority scheduler that adjusts in real time based on workload, along with an online asynchronous worklist that minimizes redundant disk accesses by efficiently reusing active blocks in memory. Moreover, ACGraph unifies asynchronous I/O with computation in a pipelined execution model that maintains sustained I/O activation, and leverages a highly optimized hybrid storage format to expedite access to low-degree vertices. We implement popular graph algorithms, such as Breadth-First Search (BFS), Weakly Connected Components (WCC), personalized PageRank (PPR), PageRank (PR), and $k$-core on ACGraph and demonstrate that ACGraph substantially outperforms state-of-the-art out-of-core graph processing systems in both runtime and I/O efficiency.