🤖 AI Summary
Existing serverless data analytics systems rely on manual tuning or static execution plans, making it infeasible to automatically identify Pareto-optimal query plans that jointly minimize cost and latency amid an exponential space of feasible plans. This paper proposes the first end-to-end, serverless-native query optimization framework. It introduces a state-space pruning strategy and a novel search algorithm to automatically discover Pareto-optimal plans without human intervention. The framework integrates a lightweight cost model, an adaptive planner, and a FaaS-native execution engine to enable real-time plan evaluation and execution for complex queries. Experiments across diverse real-world workloads demonstrate that our system reduces total cost by 37% on average and improves query latency by 2.1× over AWS Athena. To the best of our knowledge, this is the first approach to achieve automated, provably Pareto-optimal query optimization in serverless environments.
📝 Abstract
Running data analytics queries on serverless (FaaS) workers has been shown to be cost- and performance-efficient for a variety of real-world scenarios, including intermittent query arrival patterns, sudden load spikes and management challenges that afflict managed VM clusters. Alas, existing serverless data analytics works focus primarily on the serverless execution engine and assume the existence of a "good" query execution plan or rely on user guidance to construct such a plan. Meanwhile, even simple analytics queries on serverless have a huge space of possible plans, with vast differences in both performance and cost among plans.
This paper introduces Odyssey, an end-to-end serverless-native data analytics pipeline that integrates a query planner, cost model and execution engine. Odyssey automatically generates and evaluates serverless query plans, utilizing state space pruning heuristics and a novel search algorithm to identify Pareto-optimal plans that balance cost and performance with low latency even for complex queries. Our evaluations demonstrate that Odyssey accurately predicts both monetary cost and latency, and consistently outperforms AWS Athena on cost and/or latency.