Worst-Case Optimal GPU Datalog

📅 2026-04-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

198K/year
🤖 AI Summary
This work addresses the limitations of existing GPU-based Datalog engines, which rely on binary joins and suffer from time and space explosion under multi-way queries due to the AGM bound, often leading to out-of-memory errors. While worst-case optimal joins (WCOJs) offer theoretical advantages, their GPU implementations face severe load imbalance caused by attribute-level intersections. To overcome these challenges, we present SRDatalog, the first GPU Datalog engine based on WCOJ, featuring a novel combination of flat columnar storage and a two-phase deterministic memory allocation scheme to prevent memory overflow. Additionally, SRDatalog introduces histogram-guided root-level load balancing, structured auxiliary relation splitting, and stream-aligned rule reuse mechanisms, all carefully designed to align with the SIMT architecture. Evaluated on real-world program analysis workloads, SRDatalog achieves a geometric mean speedup of 21–47× over state-of-the-art systems, substantially improving multi-way query performance.

Technology Category

Application Category

📝 Abstract
Datalog is a declarative logic-programming language used for complex analytic reasoning workloads such as program analysis and graph analytics. Datalog's popularity is due to its unique price-point, marrying logic-defined specification with the potential for massive data parallelism. While traditional engines are CPU-based, the memory-bound nature of Datalog has led to increasing interest in leveraging GPUs. These engines beat CPU-based engines by operationalizing iterated relational joins via SIMT-friendly join algorithms. Unfortunately, all existing GPU Datalog engines are built on binary joins, which are inadequate for the complex multi-way queries arising in production systems such as DOOP and ddisasm. For these queries, binary decomposition can incur the AGM bound asymptotic blowup in time and space, leading to OOM failures regardless of join order. Worst-Case Optimal Joins (WCOJ) avoid this blowup, but their attribute-at-a-time intersections map poorly to SIMT hardware under key skew, causing severe load imbalance across Streaming Multiprocessors (SMs). We present SRDatalog, the first GPU Datalog engine based on WCOJ. SRDatalog uses flat columnar storage and two-phase deterministic memory allocation to avoid the OOM failures of binary joins and the index-rebuild overheads of static WCOJ systems. To mitigate skew and hide hardware stalls, SRDatalog further employs root-level histogram-guided load balancing, structural helper-relation splitting, and stream-aligned rule multiplexing. On real-world program-analysis workloads, SRDatalog achieves geometric-mean speedups of 21x to 47x.
Problem

Research questions and friction points this paper is trying to address.

Datalog
GPU
Worst-Case Optimal Joins
Multi-way Queries
Load Imbalance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Worst-Case Optimal Joins
GPU Datalog
Load Balancing
Columnar Storage
Multi-way Joins
🔎 Similar Papers
No similar papers found.