Rethinking Analytical Processing in the GPU Era

📅 2025-08-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited adoption of GPU-accelerated data analytics due to insufficient hardware-software co-design, this paper proposes Sirius—a native GPU-accelerated SQL engine that fundamentally rearchitects analytical database systems with GPU as the primary execution engine. Methodologically, Sirius implements GPU-native relational operators atop libcudf and leverages the Substrait standard for portable, cross-system query plan exchange, enabling decoupled, plug-and-play integration with systems such as DuckDB and Apache Doris. Its key contributions are: (1) eliminating CPU-GPU hybrid execution in favor of end-to-end GPU-native processing; and (2) establishing a standardized, portable heterogeneous acceleration interface via Substrait. Experiments demonstrate up to 7× speedup for single-node DuckDB integration and 12.5× peak throughput improvement for distributed Doris integration—both at equivalent hardware cost—significantly advancing the practical deployment of GPUs in interactive analytical workloads.

Technology Category

Application Category

📝 Abstract
The era of GPU-powered data analytics has arrived. In this paper, we argue that recent advances in hardware (e.g., larger GPU memory, faster interconnect and IO, and declining cost) and software (e.g., composable data systems and mature libraries) have removed the key barriers that have limited the wider adoption of GPU data analytics. We present Sirius, a prototype open-source GPU-native SQL engine that offers drop-in acceleration for diverse data systems. Sirius treats GPU as the primary engine and leverages libraries like libcudf for high-performance relational operators. It provides drop-in acceleration for existing databases by leveraging the standard Substrait query representation, replacing the CPU engine without changing the user-facing interface. On TPC-H, Sirius achieves 7x speedup when integrated with DuckDB in a single node at the same hardware rental cost, and up to 12.5x speedup when integrated with Apache Doris in a distributed setting.
Problem

Research questions and friction points this paper is trying to address.

GPU-native SQL engine for faster data analytics
Overcoming hardware and software barriers in GPU adoption
Drop-in acceleration for existing database systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU-native SQL engine for analytics
Leverages libcudf for relational operators
Uses Substrait for drop-in acceleration
🔎 Similar Papers
No similar papers found.
B
Bobbi Yogatama
University of Wisconsin-Madison
Yifei Yang
Yifei Yang
Shanghai Jiao Tong University
Natural Language Processing
K
Kevin Kristensen
University of Wisconsin-Madison
D
Devesh Sarda
University of Wisconsin-Madison
A
Abigale Kim
University of Wisconsin-Madison
A
Adrian Cockcroft
OrionX
Y
Yu Teng
University of Wisconsin-Madison
J
Joshua Patterson
NVIDIA
G
Gregory Kimball
NVIDIA
W
Wes McKinney
Posit PBC
W
Weiwei Gong
Oracle
Xiangyao Yu
Xiangyao Yu
University of Wisconsin-Madison
DatabasesComputer Architecture