🤖 AI Summary
This work addresses the performance limitations of general-purpose relational database management systems (RDBMS) on specific query workloads and the limited adoption of custom accelerators due to high development costs and narrow applicability. The authors propose Tailwind, a framework that seamlessly integrates diverse accelerators into any RDBMS supporting data import/export by leveraging an external query planner. Its key innovation is the introduction of an Abstract Logical Plan (ALP)—a declarative intermediate representation—combined with a neural cost model that automatically predicts acceleration benefits. At runtime, Tailwind transparently rewrites queries to invoke the optimal accelerator without modifying the database kernel. Experiments integrating four accelerators into Amazon Redshift and DuckDB demonstrate an average speedup of 1.38× on TPC-H queries, with peak improvements reaching 29×.
📝 Abstract
Relational database management systems (RDBMSes) can process general-purpose queries, but often have lower performance compared to custom-built solutions for specific queries. For example, consider a group-by query over a few known groups (e.g., grouping by country). While an RDBMS would likely use a hash map to do the grouping, a faster method could hard-code the expected groups into the query executor. But such workload-specific techniques, which we call query accelerators, are not widely used in practice because the engineering effort (optimizer and engine changes, potential bugs) does not justify the isolated performance gains (speedup on a single specific query). We propose Tailwind: an external query planner that brings accelerators into any RDBMS that supports data import/export. Users define their accelerators using abstract logical plans (ALPs): a new mostly-declarative abstraction over relational operators built on regular tree expressions. ALPs allow Tailwind to automatically build customized neural network models to estimate when using a particular accelerator is beneficial. At runtime, Tailwind sits atop an RDBMS and transparently rewrites queries to run across one or more accelerators when predicted to be beneficial, falling back to the underlying RDBMS when not. On Redshift and DuckDB with a library of four diverse accelerators, Tailwind accelerates TPC-H queries by 1.38x on average (up to 29x).