Eudoxia: a FaaS scheduling simulator for the composable lakehouse

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Composable data lakehouse (DLH) workloads in Function-as-a-Service (FaaS) environments pose significant challenges for scheduling modeling and high-cost empirical evaluation. Method: This paper introduces the first lightweight, deterministic FaaS scheduling simulation framework tailored to lakehouse scenarios. Built upon an event-driven simulation paradigm, it features an abstracted FaaS execution model, a pluggable scheduling policy interface, and fine-grained lakehouse workload modeling capabilities. Contribution/Results: Compared to conventional cloud-based simulation approaches, our framework substantially reduces algorithm iteration and infrastructure adaptation overhead. It enables efficient, reproducible scheduling evaluation of diverse real-world lakehouse tasks—including ETL, ad-hoc queries, and streaming-batch hybrid jobs—within a unified function runtime. Its core innovation lies in the first deep integration of deterministic simulation with lakehouse-specific scheduling requirements, establishing an extensible, principled validation baseline for scheduling research in cloud-native data systems.

Technology Category

Application Category

📝 Abstract

Due to the variety of its target use cases and the large API surface area to cover, a data lakehouse (DLH) is a natural candidate for a composable data system. Bauplan is a composable DLH built on"spare data parts"and a unified Function-as-a-Service (FaaS) runtime for SQL queries and Python pipelines. While FaaS simplifies both building and using the system, it introduces novel challenges in scheduling and optimization of data workloads. In this work, starting from the programming model of the composable DLH, we characterize the underlying scheduling problem and motivate simulations as an effective tools to iterate on the DLH. We then introduce and release to the community Eudoxia, a deterministic simulator for scheduling data workloads as cloud functions. We show that Eudoxia can simulate a wide range of workloads and enables highly customizable user implementations of scheduling algorithms, providing a cheap mechanism for developers to evaluate different scheduling algorithms against their infrastructure.

Problem

Research questions and friction points this paper is trying to address.

Scheduling and optimizing FaaS-based data workloads in composable lakehouse

Characterizing scheduling challenges in composable data lakehouse systems

Developing a simulator to evaluate customizable scheduling algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

FaaS runtime for SQL and Python pipelines

Deterministic simulator for scheduling workloads

Customizable scheduling algorithm implementations

🔎 Similar Papers

CloudNativeSim: a toolkit for modeling and simulation of cloud-native applications