🤖 AI Summary
Large language models (LLMs) face challenges in process modeling (PMo) tasks—such as process model generation (PMG)—due to the heterogeneity of process model representations (PMRs) and the absence of systematic, cross-representation evaluation frameworks. Method: We conduct the first empirical, cross-PMR study: (i) curating PMo Dataset, a high-quality benchmark of 55 paired samples; (ii) establishing a unified evaluation framework; and (iii) systematically comparing nine mainstream PMRs—including Mermaid and BPMN text—along applicability and LLM generation performance. Contribution/Results: We propose a novel multi-dimensional evaluation metric suite capturing structural readability, element fidelity, and LLM adaptability. Results reveal trade-offs across PMRs: Mermaid achieves the best overall applicability, while BPMN text excels in process-element similarity. This work establishes a reproducible benchmark and provides principled guidance for PMR selection in LLM-driven process modeling.
📝 Abstract
Large Language Models (LLMs) are increasingly applied for Process Modeling (PMo) tasks such as Process Model Generation (PMG). To support these tasks, researchers have introduced a variety of Process Model Representations (PMRs) that serve as model abstractions or generation targets. However, these PMRs differ widely in structure, complexity, and usability, and have never been systematically compared. Moreover, recent PMG approaches rely on distinct evaluation strategies and generation techniques, making comparison difficult. This paper presents the first empirical study that evaluates multiple PMRs in the context of PMo with LLMs. We introduce the PMo Dataset, a new dataset containing 55 process descriptions paired with models in nine different PMRs. We evaluate PMRs along two dimensions: suitability for LLM-based PMo and performance on PMG. extit{Mermaid} achieves the highest overall score across six PMo criteria, whereas extit{BPMN text} delivers the best PMG results in terms of process element similarity.