A Terminology for Scientific Workflow Systems

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The workflow management system (WMS) domain has long suffered from a lack of standardized terminology, impeding objective system selection, reproducible evaluation, and interoperability. To address this, we propose the first “Five-Axis Standardized Terminology Framework,” systematically characterizing WMS capabilities across five dimensions: workflow characteristics, orchestration, composition, data management, and metadata capture. Leveraging conceptual modeling, cross-system feature analysis, expert consensus workshops, and empirical classification, we establish the first community-endorsed terminology taxonomy. We apply this framework to perform structured capability mapping and classification of 23 widely adopted WMSs. Our framework shifts WMS evaluation from anecdotal, experience-driven practices toward comparable, reproducible, and evidence-based decision-making. It enhances transparency in system selection and lays a foundational semantic basis for interoperability across heterogeneous scientific workflows.

Technology Category

Application Category

📝 Abstract
The term scientific workflow has evolved over the last two decades to encompass a broad range of compositions of interdependent compute tasks and data movements. It has also become an umbrella term for processing in modern scientific applications. Today, many scientific applications can be considered as workflows made of multiple dependent steps, and hundreds of workflow management systems (WMSs) have been developed to manage and run these workflows. However, no turnkey solution has emerged to address the diversity of scientific processes and the infrastructure on which they are implemented. Instead, new research problems requiring the execution of scientific workflows with some novel feature often lead to the development of an entirely new WMS. A direct consequence is that many existing WMSs share some salient features, offer similar functionalities, and can manage the same categories of workflows but also have some distinct capabilities. This situation makes researchers who develop workflows face the complex question of selecting a WMS. This selection can be driven by technical considerations, to find the system that is the most appropriate for their application and for the resources available to them, or other factors such as reputation, adoption, strong community support, or long-term sustainability. To address this problem, a group of WMS developers and practitioners joined their efforts to produce a community-based terminology of WMSs. This paper summarizes their findings and introduces this new terminology to characterize WMSs. This terminology is composed of fives axes: workflow characteristics, composition, orchestration, data management, and metadata capture. Each axis comprises several concepts that capture the prominent features of WMSs. Based on this terminology, this paper also presents a classification of 23 existing WMSs according to the proposed axes and terms.
Problem

Research questions and friction points this paper is trying to address.

Diverse scientific workflows lack standardized terminology.
Many workflow management systems share features but differ in capabilities.
Researchers struggle to select appropriate workflow management systems.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed community-based WMS terminology
Classified 23 WMSs using five axes
Addressed WMS diversity and selection complexity
🔎 Similar Papers
No similar papers found.
F
Fr'ed'eric Sutera
Oak Ridge National Laboratory, TN, USA
T
T. Coleman
University of California, San Diego, CA, USA
.
.Ilkay Altintacs
University of California, San Diego, CA, USA
Rosa M. Badia
Rosa M. Badia
Barcelona Supercomputing Center (BSC), Universitat Politècnica de Catalunya (UPC)
Parallel Programming modelsWorkflow systemsDistributed ComputingEdge to Fog ComputingHPC-AI
B
B. Baliś
AGH University of Krakow, Krakow, Poland
Kyle Chard
Kyle Chard
University of Chicago and Argonne National Laboratory
computer sciencedistributed systemshigh performance computingscientific computing
Iacopo Colonnelli
Iacopo Colonnelli
Università di Torino
WorkflowsHigh Performance ComputingParallel ComputingDistributed Computing
Ewa Deelman
Ewa Deelman
University of Southern California, Information Sciences Institute
distributed computingcloud computingworkflow management
Paolo Di Tommaso
Paolo Di Tommaso
Seqera
Computer ScienceBioinformatics
Thomas Fahringer
Thomas Fahringer
University of Innsbruck, Institute of Computer Science, Distributed and Parallel Systems Group
Parallel SystemsParallelizing CompilersHeterogeneous energy-aware computingCloud and Grid Computing
C
Carole A. Goble
University of Manchester, Manchester, United Kingdom
S
S. Jha
Rutgers University-New Brunswick; Princeton Plasma Physics Laboratory; Princeton University, NJ, USA
Daniel S. Katz
Daniel S. Katz
NCSA, CS, iSchool @ UIUC
Parallel and Distributed Software & ApplicationseScienceCyberinfrastructureSustainability
J
Johannes Koster
University of Duisburg-Essen, Essen, Germany
U
Ulf Leser
Institute for Computer Science, Humboldt-Universität zu Berlin, Berlin, Germany
Kshitij Mehta
Kshitij Mehta
Computer Scientist, Oak Ridge National Lab
H
Hilary Oliver
National Institute of Water and Atmospheric Research, Wellington, New Zealand
J
J. Peterson
Giovanni Pizzi
Giovanni Pizzi
Laboratory for Materials Simulations, Paul Scherrer Institute (PSI), Villigen PSI, Switzerland
Solid-state PhysicsMaterials ScienceMaterials simulations
L
L. Pottier
Raül Sirvent
Raül Sirvent
Established Researcher, Barcelona Supercomputing Center
Programming ModelsHigh Performance ComputingDistributed ComputingWorkflowsProvenance
E
E. Suchyta
Oak Ridge National Laboratory, TN, USA
D
D. Thain
University of Notre Dame, Notre Dame, IN, USA
Sean R. Wilkinson
Sean R. Wilkinson
Research Scientist, Oak Ridge National Laboratory
BioinformaticsData ScienceHigh Performance ComputingFAIRWorkflows
J
Justin M. Wozniak
Argonne National Laboratory, Lemont, IL, USA
Rafael Ferreira da Silva
Rafael Ferreira da Silva
Oak Ridge National Laboratory
Scientific WorkflowsDistributed ComputingWorkflow ManagementModeling and SimulationHigh Performance Computing