Evaluating the Process Modeling Abilities of Large Language Models -- Preliminary Foundations and Results

📅 2025-03-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-based process modeling evaluation suffers from several limitations: overreliance on output quality while neglecting computational cost, latency, and multi-objective trade-offs; ambiguous quality definitions; difficulties in result verification; poor generalizability; and data leakage risks. To address these, we propose the first multidimensional evaluation framework specifically designed for business process modeling with LLMs. It integrates process mining metrics, multi-objective optimization theory, and controlled empirical experiments to systematically quantify the quality–cost–time trade-off. We introduce, for the first time, a Pareto-optimality-based evaluation paradigm—replacing conventional single-objective quality-centric assessment. Furthermore, we identify and formally characterize six fundamental challenges, establishing rigorous methodological prerequisites for scientific evaluation. This work lays the foundational methodology for building reproducible, verifiable, and robust LLM-based process modeling benchmarks.

Technology Category

Application Category

📝 Abstract
Large language models (LLM) have revolutionized the processing of natural language. Although first benchmarks of the process modeling abilities of LLM are promising, it is currently under debate to what extent an LLM can generate good process models. In this contribution, we argue that the evaluation of the process modeling abilities of LLM is far from being trivial. Hence, available evaluation results must be taken carefully. For example, even in a simple scenario, not only the quality of a model should be taken into account, but also the costs and time needed for generation. Thus, an LLM does not generate one optimal solution, but a set of Pareto-optimal variants. Moreover, there are several further challenges which have to be taken into account, e.g. conceptualization of quality, validation of results, generalizability, and data leakage. We discuss these challenges in detail and discuss future experiments to tackle these challenges scientifically.
Problem

Research questions and friction points this paper is trying to address.

Assessing LLM's ability to generate effective process models.
Evaluating quality, cost, and time in process model generation.
Addressing challenges like quality conceptualization and result validation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates LLM process modeling abilities scientifically
Considers Pareto-optimal variants in model generation
Addresses quality, validation, and generalizability challenges
🔎 Similar Papers
No similar papers found.
Peter Fettke
Peter Fettke
DFKI, Saarland University
Business InformaticsConceptual ModelingProcess Mining
C
Constantin Houy
German Research Center for Artificial Intelligence (DFKI) and Saarland University, Saarbrücken, Germany