Automated Planning for Optimal Data Pipeline Instantiation

📅 2025-03-16

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This work addresses the problem of efficient cluster deployment for data pipelines in datacenters. We propose a novel method that jointly optimizes computational resource allocation and operator scheduling. The problem is formulated as a planning problem with action costs, encoded in PDDL, to minimize end-to-end execution time while explicitly modeling data transfer overhead, operator execution requirements, and distributed resource constraints. Our key contribution is the introduction of a heuristic planning strategy guided by dataflow graph connectivity—marking the first systematic application of automated planning techniques to pipeline instantiation optimization. Experimental evaluation demonstrates significant improvements in compute-communication co-scheduling efficiency: across multiple benchmark scenarios, our approach reduces average end-to-end execution time by 23.7% compared to state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract

Data pipeline frameworks provide abstractions for implementing sequences of data-intensive transformation operators, automating the deployment and execution of such transformations in a cluster. Deploying a data pipeline, however, requires computing resources to be allocated in a data center, ideally minimizing the overhead for communicating data and executing operators in the pipeline while considering each operator's execution requirements. In this paper, we model the problem of optimal data pipeline deployment as planning with action costs, where we propose heuristics aiming to minimize total execution time. Experimental results indicate that the heuristics can outperform the baseline deployment and that a heuristic based on connections outperforms other strategies.

Problem

Research questions and friction points this paper is trying to address.

Optimize data pipeline deployment in clusters

Minimize communication and execution overhead

Propose heuristics to reduce total execution time

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modeling pipeline deployment as planning with action costs

Proposing heuristics to minimize total execution time

Heuristic based on connections outperforms other strategies

🔎 Similar Papers

No similar papers found.