Mind the Gap: A Formal Investigation of the Relationship Between Log and Model Complexity -- Extended Version

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether log complexity reliably predicts the complexity of process models discovered from event logs. Method: Through formal modeling and large-scale empirical analysis, we systematically evaluate correlations between 18 log complexity metrics and 17 model complexity metrics across five mainstream process discovery algorithms. Contribution/Results: We find that only for the Flower algorithm does any log metric provably upper-bound model complexity; for all other algorithms, no single log metric robustly predicts model complexity. The study exposes fundamental limitations in the predictive power of existing complexity measures and demonstrates that algorithm-specific sensitivity to log characteristics must be explicitly declared by designers. These findings provide both theoretical grounding and practical guidance for selecting and optimizing process mining algorithms with respect to model simplicity.

Technology Category

Application Category

📝 Abstract
Simple process models are key for effectively communicating the outcomes of process mining. An important question in this context is whether the complexity of event logs used as inputs to process discovery algorithms can serve as a reliable indicator of the complexity of the resulting process models. Although various complexity measures for both event logs and process models have been proposed in the literature, the relationship between input and output complexity remains largely unexplored. In particular, there are no established guidelines or theoretical foundations that explain how the complexity of an event log influences the complexity of the discovered model. This paper examines whether formal guarantees exist such that increasing the complexity of event logs leads to increased complexity in the discovered models. We study 18 log complexity measures and 17 process model complexity measures across five process discovery algorithms. Our findings reveal that only the complexity of the flower model can be established by an event log complexity measure. For all other algorithms, we investigate which log complexity measures influence the complexity of the discovered models. The results show that current log complexity measures are insufficient to decide which discovery algorithms to choose to construct simple models. We propose that authors of process discovery algorithms provide insights into which log complexity measures predict the complexity of their results.
Problem

Research questions and friction points this paper is trying to address.

Investigates link between event log and process model complexity
Assesses 18 log and 17 model complexity measures
Finds current log measures inadequate for model simplicity prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Explores log-model complexity relationship formally
Tests 18 log and 17 model complexity measures
Proposes algorithm-specific log complexity predictors
🔎 Similar Papers
No similar papers found.