Accelerating Loading WebGraphs in ParaGrapher

πŸ“… 2025-07-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
ParaGrapher faces dual bottlenecks when loading large-scale compressed graphs: low storage bandwidth utilization and sharply declining decompression throughput as compression ratio increases. To address these, we propose PG-Fuseβ€”a user-space file system built atop FUSEβ€”and CompBin, a compact binary storage format. PG-Fuse optimizes I/O access via fine-grained block caching and intelligent I/O scheduling. CompBin extends CSR-based binary encoding to enable zero-copy neighbor access and memory-efficient decompression. Together, they jointly alleviate both I/O and computational bandwidth bottlenecks. Evaluated on 12 real-world and synthetic graphs (up to 128 billion edges), PG-Fuse achieves 7.6Γ— average improvement in graph loading throughput, while CompBin delivers up to 21.8Γ— end-to-end speedup. Our approach significantly enhances the real-time processing capability of graph analytics frameworks over ultra-large-scale compressed graphs.

Technology Category

Application Category

πŸ“ Abstract
ParaGrapher is a graph loading API and library that enables graph processing frameworks to load large-scale compressed graphs with minimal overhead. This capability accelerates the design and implementation of new high-performance graph algorithms and their evaluation on a wide range of graphs and across different frameworks. However, our previous study identified two major limitations in ParaGrapher: inefficient utilization of high-bandwidth storage and reduced decompression bandwidth due to increased compression ratios. To address these limitations, we present two optimizations for ParaGrapher in this paper. To improve storage utilization, particularly for high-bandwidth storage, we introduce ParaGrapher-FUSE (PG-Fuse) a filesystem based on the FUSE (Filesystem in User Space). PG-Fuse optimizes storage access by increasing the size of requested blocks, reducing the number of calls to the underlying filesystem, and caching the received blocks in memory for future calls. To improve the decompression bandwidth, we introduce CompBin, a compact binary representation of the CSR format. CompBin facilitates direct accesses to neighbors while preventing storage usage for unused bytes. Our evaluation on 12 real-world and synthetic graphs with up to 128 billion edges shows that PG-Fuse and CompBin achieve up to 7.6 and 21.8 times speedup, respectively.
Problem

Research questions and friction points this paper is trying to address.

Improve storage utilization in high-bandwidth systems
Enhance decompression bandwidth for compressed graphs
Optimize graph loading for large-scale processing frameworks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces ParaGrapher-FUSE for optimized storage access
Develops CompBin for improved decompression bandwidth
Enables efficient large-scale compressed graph loading
πŸ”Ž Similar Papers
No similar papers found.