🤖 AI Summary
In streaming process mining, fixed-size sliding windows struggle to accommodate dynamic process evolution and concept drift, leading to model bias. To address this, we propose a dynamic window optimization method grounded in species estimation theory—introducing, for the first time, sample representativeness quantification into streaming process mining. Our approach establishes a real-time representativeness assessment model under sliding windows and adaptively adjusts window size to balance timeliness and statistical sufficiency. It requires no prior knowledge and enables online detection and response to concept drift. Experiments on multiple real-world event streams demonstrate that our method significantly improves process model accuracy (average +12.7% F1-score) and robustness to concept drift (38.5% reduction in false positive rate) compared to static-window baselines. This work establishes a novel paradigm for real-time, adaptive process analysis.
📝 Abstract
Streaming process mining deals with the real-time analysis of event streams. A common approach for it is to adopt windowing mechanisms that select event data from a stream for subsequent analysis. However, the size of these windows denotes a crucial parameter, as it influences the representativeness of the window content and, by extension, of the analysis results. Given that process dynamics are subject to changes and potential concept drift, a static, fixed window size leads to inaccurate representations that introduce bias in the analysis. In this work, we present a novel approach for streaming process mining that addresses these limitations by adjusting window sizes. Specifically, we dynamically determine suitable window sizes based on estimators for the representativeness of samples as developed for species estimation in biodiversity research. Evaluation results on real-world data sets show improvements over existing approaches that adopt static window sizes in terms of accuracy and robustness to concept drifts.