🤖 AI Summary
Existing query execution frameworks restrict inter-operator information propagation to unidirectional data flow, while instance-optimal algorithms enabling bidirectional propagation (e.g., Yannakakis) require additional I/O passes and thus suffer from poor practicality. This paper introduces the first execution framework supporting bidirectional pruning information flow within a single scan. It statically analyzes the query plan to identify blocking operators, then leverages precomputed foreign-key (FK) fingerprint columns and semi-join filtering to enable efficient bidirectional propagation—without incurring extra I/O traversals. Adopting a space-for-time trade-off, the approach incurs only 15% storage overhead. Evaluated on the Join Order Benchmark (JOB), it achieves end-to-end speedups of 1.54× (without semi-joins) and 1.24× (with semi-joins) over DuckDB v1.2, significantly overcoming the limitations of conventional unidirectional pruning.
📝 Abstract
Sideways information passing is a well-known technique for mitigating the impact of large build sides in a database query plan. As currently implemented in production systems, sideways information passing enables only a uni-directional information flow, as opposed to instance-optimal algorithms, such as Yannakakis'. On the other hand, the latter require an additional pass over the input, which hinders adoption in production systems. In this paper, we make a step towards enabling single-pass bi-directional information passing during query execution. We achieve this by statically analyzing between which tables the information flow is blocked and by leveraging precomputed join-induced fingerprint columns on FK-tables. On the JOB benchmark, Parachute improves DuckDB v1.2's end-to-end execution time without and with semi-join filtering by 1.54x and 1.24x, respectively, when allowed to use 15% extra space.