Predicting the Performance of Scientific Workflow Tasks for Cluster Resource Management: An Overview of the State of the Art

📅 2025-04-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Scientific workflows on clusters suffer from inefficient resource scheduling, high energy consumption, and unpredictable costs due to inaccurate manual performance estimation. To address this, we propose an automated, task-level performance prediction method—estimating both execution time and memory consumption—by integrating machine learning (regression and ensemble models), fine-grained feature engineering, runtime performance modeling, and workflow semantic analysis. Our approach enables cross-platform, multi-objective (including carbon-aware) generalization. We present the first systematic survey and horizontal evaluation of mainstream prediction paradigms, identifying key limitations in dynamism, transferability, and multi-objective coordination, while charting their evolutionary trajectory. We establish a unified benchmarking framework and validate our method on real-world workflows (e.g., CyberShake, SIPHT), achieving 32–47% lower prediction error. This enables resource managers to perform precise scheduling, energy-efficient operation, carbon-aware optimization, and accurate cost estimation—thereby improving cluster resource utilization and scheduling efficiency.

Technology Category

Application Category

📝 Abstract
Scientific workflow management systems support large-scale data analysis on cluster infrastructures. For this, they interact with resource managers which schedule workflow tasks onto cluster nodes. In addition to workflow task descriptions, resource managers rely on task performance estimates such as main memory consumption and runtime to efficiently manage cluster resources. Such performance estimates should be automated, as user-based task performance estimates are error-prone. In this book chapter, we describe key characteristics of methods for workflow task runtime and memory prediction, provide an overview and a detailed comparison of state-of-the-art methods from the literature, and discuss how workflow task performance prediction is useful for scheduling, energy-efficient and carbon-aware computing, and cost prediction.
Problem

Research questions and friction points this paper is trying to address.

Predicting workflow task performance for cluster resource management
Automating memory and runtime estimates to replace error-prone user inputs
Comparing state-of-the-art methods for performance prediction in scheduling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated workflow task performance prediction
Comparison of state-of-the-art prediction methods
Performance prediction for resource-efficient scheduling
🔎 Similar Papers
No similar papers found.
Jonathan Bader
Jonathan Bader
TU Berlin
Resource ManagementDistributed SystemsScientific Workflows
K
Kathleen West
University of Glasgow, School of Computing Science, Glasgow, United Kingdom
Soeren Becker
Soeren Becker
TU Berlin
Distributed SystemsEdge ComputingSelf-adaptive systemsAIOps
S
S. Kulagina
Humboldt-Universität zu Berlin, Department of Computer Science, Berlin, Germany; Karlsruhe Institute of Technology (KIT), Scientific Computing Center, Karlsruhe, Germany
Fabian Lehmann
Fabian Lehmann
Ph.D. candidate, Humboldt-Universität zu Berlin
adaptive scheduling of large workflows
L
L. Thamsen
University of Glasgow, School of Computing Science, Glasgow, United Kingdom
Henning Meyerhenke
Henning Meyerhenke
Professor of Computer Science, Karlsruhe Institute of Technology (KIT)
Scalable graph algorithmsalgorithmic network analysiscombinatorial scientific computing
O
O. Kao
TU Berlin, DOS Group, Berlin, Germany