Assessing the Business Process Modeling Competences of Large Language Models

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This study addresses the lack of systematic evaluation of large language models’ (LLMs) capability to generate BPMN business process models and the neglect of multidimensional quality criteria in existing research. To bridge this gap, we propose BEF4LLM, the first four-dimensional evaluation framework specifically designed for BPMN modeling, which assesses LLM performance along syntactic, pragmatic, semantic, and validity dimensions using standardized metrics and benchmarks against human experts. Experimental results demonstrate that LLMs excel in syntactic and pragmatic aspects, while their semantic quality, though slightly inferior to that of human experts, remains closely comparable. These findings substantiate the practical potential of LLMs in real-world business process modeling tasks.

Technology Category

Application Category

📝 Abstract

The creation of Business Process Model and Notation (BPMN) models is a complex and time-consuming task requiring both domain knowledge and proficiency in modeling conventions. Recent advances in large language models (LLMs) have significantly expanded the possibilities for generating BPMN models directly from natural language, building upon earlier text-to-process methods with enhanced capabilities in handling complex descriptions. However, there is a lack of systematic evaluations of LLM-generated process models. Current efforts either use LLM-as-a-judge approaches or do not consider established dimensions of model quality. To this end, we introduce BEF4LLM, a novel LLM evaluation framework comprising four perspectives: syntactic quality, pragmatic quality, semantic quality, and validity. Using BEF4LLM, we conduct a comprehensive analysis of open-source LLMs and benchmark their performance against human modeling experts. Results indicate that LLMs excel in syntactic and pragmatic quality, while humans outperform in semantic aspects; however, the differences in scores are relatively modest, highlighting LLMs'competitive potential despite challenges in validity and semantic quality. The insights highlight current strengths and limitations of using LLMs for BPMN modeling and guide future model development and fine-tuning. Addressing these areas is essential for advancing the practical deployment of LLMs in business process modeling.

Problem

Research questions and friction points this paper is trying to address.

Business Process Modeling

Large Language Models

BPMN

Model Quality Evaluation

Natural Language to Process

Innovation

Methods, ideas, or system contributions that make the work stand out.

BEF4LLM

Business Process Modeling

Large Language Models