An Empirical Investigation on the Challenges in Scientific Workflow Systems Development

📅 2024-11-16
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Scientific workflow systems (SWS) face persistent practical challenges in development, yet empirical evidence characterizing these difficulties across developer communities remains scarce. Method: We conduct an empirical study leveraging dual-platform data—developer questions from Stack Overflow and code-related discussions from GitHub—enabling the first cross-platform integrated analysis. We apply BERTopic for unsupervised topic modeling, classify questions into fine-grained types (How/Why/What), and perform qualitative coding. Contribution/Results: The analysis identifies 10 distinct topic categories on Stack Overflow and 13 on GitHub. “Execution” and “data structure manipulation” emerge as the most frequent bottlenecks; “How”-type questions dominate across both platforms. Crucially, we uncover shared cross-platform pain points—e.g., workflow orchestration, error diagnosis, and data serialization—providing data-driven, empirically grounded insights to inform SWS design improvements, toolchain enhancements, and targeted developer support strategies.

Technology Category

Application Category

📝 Abstract
Scientific Workflow Systems (SWSs) are advanced software frameworks that drive modern research by orchestrating complex computational tasks and managing extensive data pipelines. These systems offer a range of essential features, including modularity, abstraction, interoperability, workflow composition tools, resource management, error handling, and comprehensive documentation. Utilizing these frameworks accelerates the development of scientific computing, resulting in more efficient and reproducible research outcomes. However, developing a user-friendly, efficient, and adaptable SWS poses several challenges. This study explores these challenges through an in-depth analysis of interactions on Stack Overflow (SO) and GitHub, key platforms where developers and researchers discuss and resolve issues. In particular, we leverage topic modeling (BERTopic) to understand the topics SWSs developers discuss on these platforms. We identified 10 topics developers discuss on SO (e.g., Workflow Creation and Scheduling, Data Structures and Operations, Workflow Execution) and found that workflow execution is the most challenging. By analyzing GitHub issues, we identified 13 topics (e.g., Errors and Bug Fixing, Documentation, Dependencies) and discovered that data structures and operations is the most difficult. We also found common topics between SO and GitHub, such as data structures and operations, task management, and workflow scheduling. Additionally, we categorized each topic by type (How, Why, What, and Others). We observed that the How type consistently dominates across all topics, indicating a need for procedural guidance among developers. The dominance of the How type is also evident in domains like Chatbots and Mobile development. Our study will guide future research in proposing tools and techniques to help the community overcome the challenges developers face when developing SWSs.
Problem

Research questions and friction points this paper is trying to address.

Identify challenges in Scientific Workflow Systems development
Analyze developer discussions on Stack Overflow and GitHub
Explore common topics like workflow execution and data structures
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging BERTopic for topic modeling
Analyzing Stack Overflow and GitHub interactions
Categorizing topics by How, Why, What types
🔎 Similar Papers
No similar papers found.
K
Khairul Alam
Department of Computer Science, University of Saskatchewan, Saskatoon, S7N 5A2, Saskatchewan, Canada.
Chanchal Roy
Chanchal Roy
Professor, University of Saskatchewan
Software Engineeringsoftware maintenance and evolutionreengineering
Banani Roy
Banani Roy
University of Saskatchewan
Interactive Software EngineeringBig Data AnalyticsSoftware MaintenanceScientific Workflows
K
Kartik Mittal
Department of Computer Science, University of Saskatchewan, Saskatoon, S7N 5A2, Saskatchewan, Canada.