Do Large Language Models Speak Scientific Workflows?

📅 2024-12-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit limited performance on scientific workflow tasks—including configuration, annotation, translation, explanation, and generation—primarily due to insufficient domain knowledge. Method: This work presents the first systematic evaluation of over 20 open- and closed-source LLMs (e.g., Llama, GPT series) across mainstream workflow systems (e.g., Apache Airflow, Snakemake), employing customized prompts and a multidimensional evaluation protocol tailored to workflow semantics and execution constraints. Results: LLM accuracy on workflow tasks is substantially lower than on general NLP benchmarks; cross-system performance varies by over 40%, confirming that capabilities are highly sensitive to both task type and system architecture. The study identifies domain knowledge deficiency as the fundamental bottleneck and proposes transferable prompt optimization strategies and domain alignment techniques. It establishes the first empirical benchmark and methodological framework for leveraging LLMs in research automation.

Technology Category

Application Category

📝 Abstract
With the advent of large language models (LLMs), there is a growing interest in applying LLMs to scientific tasks. In this work, we conduct an experimental study to explore applicability of LLMs for configuring, annotating, translating, explaining, and generating scientific workflows. We use 5 different workflow specific experiments and evaluate several open- and closed-source language models using state-of-the-art workflow systems. Our studies reveal that LLMs often struggle with workflow related tasks due to their lack of knowledge of scientific workflows. We further observe that the performance of LLMs varies across experiments and workflow systems. Our findings can help workflow developers and users in understanding LLMs capabilities in scientific workflows, and motivate further research applying LLMs to workflows.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Scientific Workflow
Domain-specific Knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Scientific Workflow Tasks
Performance Evaluation
🔎 Similar Papers
No similar papers found.