🤖 AI Summary
Existing studies largely overlook the interdependencies among structural characteristics of event logs, hindering the isolation of their individual effects on process mining algorithm performance.
Method: We propose SHAining—a marginal contribution analysis framework—that systematically quantifies the independent impact of key structural features (e.g., noise ratio, behavioral variability, completeness) on adaptability, precision, and model complexity across over 22,000 real and synthetic logs. Unlike conventional causal assumptions, SHAining explicitly models feature interaction effects, integrating statistical modeling with interpretable analysis in process discovery.
Contribution/Results: SHAining reveals the magnitude and nonlinear patterns of feature influence, identifying the most impactful core features. It further enables robustness assessment of algorithms under varying structural conditions. The findings provide empirically grounded guidance for event log preprocessing, algorithm selection, and performance optimization—advancing both theoretical understanding and practical deployment of process mining techniques.
📝 Abstract
Process mining aims to extract and analyze insights from event logs, yet algorithm metric results vary widely depending on structural event log characteristics. Existing work often evaluates algorithms on a fixed set of real-world event logs but lacks a systematic analysis of how event log characteristics impact algorithms individually. Moreover, since event logs are generated from processes, where characteristics co-occur, we focus on associational rather than causal effects to assess how strong the overlapping individual characteristic affects evaluation metrics without assuming isolated causal effects, a factor often neglected by prior work. We introduce SHAining, the first approach to quantify the marginal contribution of varying event log characteristics to process mining algorithms' metrics. Using process discovery as a downstream task, we analyze over 22,000 event logs covering a wide span of characteristics to uncover which affect algorithms across metrics (e.g., fitness, precision, complexity) the most. Furthermore, we offer novel insights about how the value of event log characteristics correlates with their contributed impact, assessing the algorithm's robustness.