🤖 AI Summary
When dataset sizes exceed GPU memory capacity, CPU-to-GPU data transfers constrained by PCIe bandwidth become a critical performance bottleneck. This work proposes ZipFlow, a novel framework that, for the first time, classifies compression algorithms into three distinct modes based on their parallelization characteristics and introduces a unified compiler-driven scheduling strategy tailored to each mode. This approach enables efficient compressed data movement across diverse GPU architectures and end-to-end query optimization. Evaluated on the TPC-H benchmark, ZipFlow achieves a 2.08× speedup over nvCOMP and outperforms CPU-based engines such as DuckDB by 3.14×.
📝 Abstract
In GPU-accelerated data analytics, the overhead of data transfer from CPU to GPU becomes a performance bottleneck when the data scales beyond GPU memory capacity due to the limited PCIe bandwidth. Data compression has come to rescue for reducing the amount of data transfer while taking advantage of the powerful GPU computation for decompression. To optimize the end-to-end query performance, however, the workflow of data compression, transfer, and decompression must be holistically designed based on the compression strategies and hardware characteristics to balance the I/O latency and computational overhead. In this work, we present ZipFlow, a compiler-based framework for optimizing compressed data transfer in GPU-accelerated data analytics. ZipFlow classifies compression algorithms into three distinct patterns based on their inherent parallelism. For each pattern, ZipFlow employs generalized scheduling strategies to effectively exploit the computational power of GPUs across diverse architectures. Building on these patterns, ZipFlow delivers flexible, high-performance, and holistic optimization, which substantially advances end-to-end data transfer capabilities. We evaluate the effectiveness of ZipFlow on industry-standard benchmark, TPC-H. Overall, ZipFlow achieves an average improvement of 2.08 times over the state-of-the-art GPU compression library (nvCOMP) and 3.14 times speedup against CPU-based query processing engines (e.g., DuckDB).