Skyrise: Exploiting Serverless Cloud Infrastructure for Elastic Data Processing

📅 2025-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low resource utilization, poor long-term operational stability, and high operational costs of cloud-based data processing systems under unpredictable workloads, this paper proposes the first end-to-end fully serverless SQL query processor. The system leverages cloud functions (e.g., AWS Lambda) and serverless storage (e.g., Amazon S3) to achieve full-stack serverless compute and storage. It introduces three key innovations: (1) adaptive execution scheduling, (2) cost-aware resource orchestration, and (3) dynamic query decomposition with elastic workflow orchestration and operator-level cost-driven deployment—overcoming performance and cost bottlenecks inherent in function-as-a-service for long-running analytical workloads. Evaluated on the TB-scale TPC-H benchmark, the system matches mainstream cloud data platforms in both query latency and cost per query. This work provides the first empirical validation of the feasibility and engineering competitiveness of fully serverless architectures for complex analytical workloads.

Technology Category

Application Category

📝 Abstract
Serverless computing offers elasticity unmatched by conventional server-based cloud infrastructure. Although modern data processing systems embrace serverless storage, such as Amazon S3, they continue to manage their compute resources as servers. This is challenging for unpredictable workloads, leaving clusters often underutilized. Recent research shows the potential of serverless compute resources, such as cloud functions, for elastic data processing, but also sees limitations in performance robustness and cost efficiency for long running workloads. These challenges require holistic approaches across the system stack. However, to the best of our knowledge, there is no end-to-end data processing system built entirely on serverless infrastructure. In this paper, we present Skyrise, our effort towards building the first fully serverless SQL query processor. Skyrise exploits the elasticity of its underlying infrastructure, while alleviating the inherent limitations with a number of adaptive and cost-aware techniques. We show that both Skyrise's performance and cost are competitive to other cloud data systems for terabyte-scale queries of the analytical TPC-H benchmark.
Problem

Research questions and friction points this paper is trying to address.

Cloud Computing
Resource Utilization
Serverless Architecture
Innovation

Methods, ideas, or system contributions that make the work stand out.

Serverless Technology
SQL Query Processing
Cost-Optimization
🔎 Similar Papers
No similar papers found.
T
Thomas Bodner
Hasso Plattner Institute, University of Potsdam, Potsdam, Germany
Daniel Ritter
Daniel Ritter
SAP
Cloud Data SystemsDatabase SystemsModern HardwareDistributed SystemsFormal Methods
M
Martin Boissier
Hasso Plattner Institute, University of Potsdam, Potsdam, Germany
Tilmann Rabl
Tilmann Rabl
Hasso Plattner Institute, University of Potsdam