Eudoxia: a FaaS scheduling simulator for the composable lakehouse

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Composable data lakehouse (DLH) workloads in Function-as-a-Service (FaaS) environments pose significant challenges for scheduling modeling and high-cost empirical evaluation. Method: This paper introduces the first lightweight, deterministic FaaS scheduling simulation framework tailored to lakehouse scenarios. Built upon an event-driven simulation paradigm, it features an abstracted FaaS execution model, a pluggable scheduling policy interface, and fine-grained lakehouse workload modeling capabilities. Contribution/Results: Compared to conventional cloud-based simulation approaches, our framework substantially reduces algorithm iteration and infrastructure adaptation overhead. It enables efficient, reproducible scheduling evaluation of diverse real-world lakehouse tasks—including ETL, ad-hoc queries, and streaming-batch hybrid jobs—within a unified function runtime. Its core innovation lies in the first deep integration of deterministic simulation with lakehouse-specific scheduling requirements, establishing an extensible, principled validation baseline for scheduling research in cloud-native data systems.

Technology Category

Application Category

📝 Abstract
Due to the variety of its target use cases and the large API surface area to cover, a data lakehouse (DLH) is a natural candidate for a composable data system. Bauplan is a composable DLH built on"spare data parts"and a unified Function-as-a-Service (FaaS) runtime for SQL queries and Python pipelines. While FaaS simplifies both building and using the system, it introduces novel challenges in scheduling and optimization of data workloads. In this work, starting from the programming model of the composable DLH, we characterize the underlying scheduling problem and motivate simulations as an effective tools to iterate on the DLH. We then introduce and release to the community Eudoxia, a deterministic simulator for scheduling data workloads as cloud functions. We show that Eudoxia can simulate a wide range of workloads and enables highly customizable user implementations of scheduling algorithms, providing a cheap mechanism for developers to evaluate different scheduling algorithms against their infrastructure.
Problem

Research questions and friction points this paper is trying to address.

Scheduling and optimizing FaaS-based data workloads in composable lakehouse
Characterizing scheduling challenges in composable data lakehouse systems
Developing a simulator to evaluate customizable scheduling algorithms
Innovation

Methods, ideas, or system contributions that make the work stand out.

FaaS runtime for SQL and Python pipelines
Deterministic simulator for scheduling workloads
Customizable scheduling algorithm implementations
🔎 Similar Papers
No similar papers found.
T
Tapan Srivastava
University of Chicago
Jacopo Tagliabue
Jacopo Tagliabue
NYU
Artificial IntelligenceNLPCognitive Sciences
C
C. Greco
Bauplan Labs