Selective Parallel Loading of Large-Scale Compressed Graphs with ParaGrapher

๐Ÿ“… 2024-04-30
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 2
โœจ Influential: 1
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address cross-framework incompatibility of graph data formats and low loading efficiency for large-scale real-world graphs, this paper introduces ParaGrapherโ€”the first parallel graph loading library supporting shared/distributed memory and out-of-core scenarios. Methodologically, it features: (1) a novel fine-grained parallel decompression mechanism and performance modeling technique tailored to the WebGraph compression format; and (2) on-demand loading with a unified API across graph processing frameworks. By integrating parallel decompression, graph tile scheduling, multi-backend storage adaptation (binary, text, WebGraph), and optimized in-memory/out-of-core access, ParaGrapher achieves up to 3.2ร— faster WebGraph loading and up to 5.2ร— end-to-end speedup for graph algorithms versus conventional formats. The implementation is open-source and designed for seamless integration with mainstream graph computing frameworks.

Technology Category

Application Category

๐Ÿ“ Abstract
Comprehensive evaluation is one of the basis of experimental science. In High-Performance Graph Processing, a thorough evaluation of contributions becomes more achievable by supporting common input formats over different frameworks. However, each framework creates its specific format, which may not support reading large-scale real-world graph datasets. This shows a demand for high-performance libraries capable of loading graphs to (i) accelerate designing new graph algorithms, (ii) to evaluate the contributions on a wide range of graph algorithms, and (iii) to facilitate easy and fast comparison over different graph frameworks. To that end, we present ParaGrapher, a high-performance API and library for loading large-scale and compressed graphs. ParaGrapher supports different types of requests for accessing graphs in shared- and distributed-memory and out-of-core graph processing. We explain the design of ParaGrapher and present a performance model of graph decompression, which is used for evaluation of ParaGrapher over three storage types. Our evaluation shows that by decompressing compressed graphs in WebGraph format, ParaGrapher delivers up to 3.2 times speedup in loading and up to 5.2 times speedup in end-to-end execution in comparison to the binary and textual formats. ParaGrapher is available online on https://blogs.qub.ac.uk/DIPSA/ParaGrapher/.
Problem

Research questions and friction points this paper is trying to address.

Loading large-scale compressed graphs efficiently across different frameworks
Accelerating new graph algorithm design and evaluation processes
Enabling fast performance comparisons between various graph processing systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

ParaGrapher loads large-scale compressed graphs efficiently
It supports shared, distributed memory and out-of-core processing
It speeds up graph decompression and end-to-end execution significantly
๐Ÿ”Ž Similar Papers
No similar papers found.