🤖 AI Summary
This work addresses the challenges in efficiently optimizing hybrid SQL and AI/ML queries, which stem from the black-box nature of ML models, complex data dependencies, and a vast optimization space, compounded by the absence of a unified evaluation framework. To bridge this gap, the authors present an interactive benchmarking platform built on DuckDB, featuring a unified execution backend and a web-based visualization interface. The system supports abstract logical plan representation, extensible rewrite rules, traceable optimization decisions, and comparative analysis of multiple optimization strategies. For the first time, this platform enables transparent and reproducible development of SQL+AI/ML co-optimizers alongside end-to-end benchmarking, allowing users to intuitively inspect the optimization process and quantitatively evaluate the performance of diverse optimization approaches.
📝 Abstract
Database workloads are increasingly nesting artificial intelligence (AI) and machine learning (ML) pipelines and AI/ML model inferences with data processing, yielding hybrid SQL+AI/ML queries that mix relational operators with expensive, opaque AI/ML operators, often expressed as UDFs. These workloads are challenging to optimize because ML operators behave like black boxes, data-dependent effects such as sparsity, selectivity, and cardinalities can dominate runtime, domain experts often rely on practical heuristics that are difficult to develop with monolithic optimizers, and AI/ML operators introduce numerous co-optimization opportunities such as factorization, pushdown, ML-to-SQL conversion, and linear-algebra-to-relational-algebra rewrites, significantly enlarging the search space of equivalent execution plans. At the same time, research prototypes for SQL+ML optimization are difficult to evaluate fairly because they are typically developed on different platforms and evaluated using different queries.
We present OptBench, an interactive workbench for building and benchmarking query optimizers for hybrid SQL+AI/ML queries in a transparent, apples-to-apples manner. OptBench runs all optimizers on a unified backend using DuckDB and exposes an interactive web interface that allows users to (i) construct query optimizers by leveraging and extending abstracted logical plan rewrite actions, (ii) benchmark and compare different optimizer implementations over a suite of diverse queries while recording decision traces and latency, and (iii) visualize logical plans produced by different optimizers side-by-side. The system enables practitioners and researchers to prototype optimizer ideas, inspect plan transformations, and quantitatively compare optimizer designs on multimodal inference queries within a single workbench.