GPU Acceleration of SQL Analytics on Compressed Data

📅 2025-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low execution efficiency of SQL analytical queries under GPU memory constraints, this paper proposes a method for performing native GPU SQL computation directly on lightweight compressed data—including Run-Length Encoding (RLE), dictionary encoding, and bit-width reduction—thereby eliminating decompression overhead and overcoming GPU memory bottlenecks. Key contributions include: (1) the first multi-RLE column co-processing mechanism; (2) a unified tensorized processing framework supporting heterogeneous encoding formats (RLE, dictionary, index-based, and bit-width-reduced representations); and (3) a PyTorch-based implementation enabling portable cross-device (CPU/GPU) execution. Experiments on real-world production datasets that cannot fit entirely in GPU memory demonstrate that our approach achieves over 10× end-to-end query speedup compared to leading commercial CPU-based analytical systems.

Technology Category

Application Category

📝 Abstract
GPUs are uniquely suited to accelerate (SQL) analytics workloads thanks to their massive compute parallelism and High Bandwidth Memory (HBM) -- when datasets fit in the GPU HBM, performance is unparalleled. Unfortunately, GPU HBMs remain typically small when compared with lower-bandwidth CPU main memory. Besides brute-force scaling across many GPUs, current solutions to accelerate queries on large datasets include leveraging data partitioning and loading smaller data batches in GPU HBM, and hybrid execution with a connected device (e.g., CPUs). Unfortunately, these approaches are exposed to the limitations of lower main memory and host-to-device interconnect bandwidths, introduce additional I/O overheads, or incur higher costs. This is a substantial problem when trying to scale adoption of GPUs on larger datasets. Data compression can alleviate this bottleneck, but to avoid paying for costly decompression/decoding, an ideal solution must include computation primitives to operate directly on data in compressed form. This is the focus of our paper: a set of new methods for running queries directly on light-weight compressed data using schemes such as Run-Length Encoding (RLE), index encoding, bit-width reductions, and dictionary encoding. Our novelty includes operating on multiple RLE columns without decompression, handling heterogeneous column encodings, and leveraging PyTorch tensor operations for portability across devices. Experimental evaluations show speedups of an order of magnitude compared to state-of-the-art commercial CPU-only analytics systems, for real-world queries on a production dataset that would not fit into GPU memory uncompressed. This work paves the road for GPU adoption in a much broader set of use cases, and it is complementary to most other scale-out or fallback mechanisms.
Problem

Research questions and friction points this paper is trying to address.

Accelerating SQL analytics on compressed data using GPUs
Overcoming GPU memory limitations for large datasets
Enabling direct query execution on compressed data formats
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU-accelerated SQL analytics on compressed data
Direct query execution on lightweight compressed formats
Leveraging PyTorch tensor operations for cross-device portability
🔎 Similar Papers
No similar papers found.
Z
Zezhou Huang
Columbia University (internship at Microsoft Gray Systems Lab)
K
Krystian Sakowski
Microsoft
H
Hans Lehnert
Microsoft
W
Wei Cui
Microsoft
C
C. Curino
Microsoft
Matteo Interlandi
Matteo Interlandi
Microsoft
Large scale data processingmachine learning
M
Marius Dumitru
Microsoft
Rathijit Sen
Rathijit Sen
Microsoft Gray Systems Lab
Computer ArchitectureDatabase SystemsMachine Learning