GRACEFUL: A Learned Cost Estimator For UDFs

📅 2025-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the query optimizer’s failure caused by inaccurate cost estimation of User-Defined Functions (UDFs) in DBMSs, this paper proposes the first end-to-end learned cost estimator specifically designed for UDFs. Methodologically, it introduces deep neural networks to UDF optimization for the first time, jointly encoding multimodal features—including query structure, UDF semantics, and runtime environment—and supports dynamic push-down/pull-up filtering decisions; it further incorporates a lightweight online inference mechanism. Key contributions include: (i) releasing an open-source, high-quality synthetic dataset comprising over 90,000 UDF-involved queries; (ii) achieving up to 50× query speedup on real-world workloads and significantly improving the quality of UDF-augmented query plans; and (iii) demonstrating strong generalizability across diverse DBMS backends.

Technology Category

Application Category

📝 Abstract
User-Defined-Functions (UDFs) are a pivotal feature in modern DBMS, enabling the extension of native DBMS functionality with custom logic. However, the integration of UDFs into query optimization processes poses significant challenges, primarily due to the difficulty of estimating UDF execution costs. Consequently, existing cost models in DBMS optimizers largely ignore UDFs or rely on static assumptions, resulting in suboptimal performance for queries involving UDFs. In this paper, we introduce GRACEFUL, a novel learned cost model to make accurate cost predictions of query plans with UDFs enabling optimization decisions for UDFs in DBMS. For example, as we show in our evaluation, using our cost model, we can achieve 50x speedups through informed pull-up/push-down filter decisions of the UDF compared to the standard case where always a filter push-down is applied. Additionally, we release a synthetic dataset of over 90,000 UDF queries to promote further research in this area.
Problem

Research questions and friction points this paper is trying to address.

Estimating execution costs of User-Defined-Functions in DBMS
Improving query optimization decisions involving UDFs
Addressing suboptimal performance due to static UDF cost assumptions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learned cost model for UDFs
Accurate UDF execution cost predictions
Enables optimization decisions in DBMS
🔎 Similar Papers
No similar papers found.