One Join Order Does Not Fit All: Reducing Intermediate Results with Per-Split Query Plans

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address intermediate result explosion in multi-way joins—especially cyclic ones—this paper introduces SplitJoin, a novel framework that elevates *split* to a first-class query operator. It employs threshold-based dynamic data partitioning to divide each relation into “heavy” and “light” parts, then tailors join orders and execution plans per partition. This breaks the conventional single-plan paradigm, enabling data-distribution-aware adaptive optimization. Technically, SplitJoin integrates heavy-light partitioning, split-aware join ordering, and system-level implementation in both DuckDB and Umbra (via its frontend). Experiments show that, on DuckDB, SplitJoin successfully executes 43 queries (vs. 29 baseline), achieving 2.1× average speedup and reducing intermediate results by 7.9×. On Umbra, it executes 45 queries (vs. 35 baseline), delivering 1.3× average speedup and 1.2× reduction in intermediate results. The work thus advances join optimization by unifying logical partitioning, plan specialization, and practical database engine integration.

Technology Category

Application Category

📝 Abstract
Minimizing intermediate results is critical for efficient multi-join query processing. Although the seminal Yannakakis algorithm offers strong guarantees for acyclic queries, cyclic queries remain an open challenge. In this paper, we propose SplitJoin, a framework that introduces split as a first-class query operator. By partitioning input tables into heavy and light parts, SplitJoin allows different data partitions to use distinct query plans, with the goal of reducing intermediate sizes using existing binary join engines. We systematically explore the design space for split-based optimizations, including threshold selection, split strategies, and join ordering after splits. Implemented as a front-end to DuckDB and Umbra, SplitJoin achieves substantial improvements: on DuckDB, SplitJoin completes 43 social network queries (vs. 29 natively), achieving 2.1x faster runtime and 7.9x smaller intermediates on average (up to 13.6x and 74x, respectively); on Umbra, it completes 45 queries (vs. 35), achieving 1.3x speedups and 1.2x smaller intermediates on average (up to 6.1x and 2.1x, respectively).
Problem

Research questions and friction points this paper is trying to address.

Reducing intermediate results in multi-join query processing
Optimizing cyclic queries beyond Yannakakis algorithm guarantees
Enabling per-split query plans for efficient binary join execution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces split as a first-class query operator
Partitions tables into heavy and light parts
Uses distinct query plans per data partition
🔎 Similar Papers
No similar papers found.
Y
Yujun He
University of Wisconsin-Madison
H
Hangdong Zhao
Microsoft Gray Systems Lab
S
Simon Frisk
University of Wisconsin-Madison
Yifei Yang
Yifei Yang
Shanghai Jiao Tong University
Natural Language Processing
K
Kevin Kristensen
University of Wisconsin-Madison
Paraschos Koutris
Paraschos Koutris
Computer Sciences, University of Wisconsin-Madison
data management
Xiangyao Yu
Xiangyao Yu
University of Wisconsin-Madison
DatabasesComputer Architecture