🤖 AI Summary
This study addresses the challenges of automatically reconstructing BPMN models from unstructured natural language descriptions, including specification heterogeneity, multilingual inputs, and the absence of ground-truth references. To overcome these issues, the authors propose a large language model (LLM)-driven, multi-stage automation pipeline that integrates multilingual translation, SpiffWorkflow-based execution validation, and LLM-guided iterative repair to generate high-quality, executable BPMN 2.0 XML ground-truth corpora. A novel multidimensional similarity evaluation framework—combining structural alignment, type distribution, and semantic embeddings—is introduced to enable fully automated, large-scale BPMN generation and refinement without manual intervention. Evaluated on 750 public process diagrams, the approach successfully constructs 387 validated models with an average reconstruction similarity exceeding 0.75, including approximately 50 near-perfect reconstructions differing only in element naming.
📝 Abstract
Automatically reconstructing BPMN models from unstructured natural-language descriptions remains challenging due to heterogeneous modeling conventions, multilingual sources, and the lack of reliable ground truth. We present a scalable, multi-stage LLM-driven pipeline that automates both ground-truth construction and model reconstruction. Multilingual BPMN XML files are translated into English, validated using execution-oriented compliance checks in SpiffWorkflow, and iteratively repaired through targeted LLM-guided corrections to produce a consistent ground-truth corpus. From these validated models, process descriptions are generated and used to reconstruct executable BPMN~2.0 XML diagrams without manual curation. We introduce a multi-dimensional similarity framework combining structural metrics, type-distribution alignment, and embedding-based semantic measures. In an empirical study of 750 public BPMN diagrams, the pipeline generated 387 validated ground-truth models and achieved average reconstruction similarity above 0.75, including approximately 50 near-perfect reconstructions differing only in minor naming variations. The results demonstrate that LLMs can generate structurally compliant and semantically meaningful BPMN diagrams at scale.