🤖 AI Summary
Traditional sequence-based predictive process monitoring (PPM) methods—such as RNNs and Transformers—exhibit limited modeling capacity for complex, long-running, highly cyclic business processes. To address this, we propose DFG-GNN: a graph neural network framework that transforms process event logs from sequential traces into Directly-Follows Graphs (DFGs), and systematically integrates multiple GNN paradigms (e.g., GCN, GAT, EdgeCNN) to jointly model nodes, edges, and multi-graph structures. We further design a process-aware event embedding scheme and a graph-level prediction architecture to minimize information loss during trace-to-graph conversion. Extensive experiments on real-world process logs demonstrate that DFG-GNN achieves average improvements of 7.2% in F1-score (for outcome prediction) and 5.8% in MAE (for remaining time prediction) over state-of-the-art sequence models. To the best of our knowledge, this is the first work to systematically investigate DFG representation learning coupled with heterogeneous GNNs for PPM.
📝 Abstract
In the past years, predictive process monitoring (PPM) techniques based on artificial neural networks have evolved as a method to monitor the future behavior of business processes. Existing approaches mostly focus on interpreting the processes as sequences, so-called traces, and feeding them to neural architectures designed to operate on sequential data such as recurrent neural networks (RNNs) or transformers. In this study, we investigate an alternative way to perform PPM: by transforming each process in its directly-follows-graph (DFG) representation we are able to apply graph neural networks (GNNs) for the prediction tasks. By this, we aim to develop models that are more suitable for complex processes that are long and contain an abundance of loops. In particular, we present different ways to create DFG representations depending on the particular GNN we use. The tested GNNs range from classical node-based to novel edge-based architectures. Further, we investigate the possibility of using multi-graphs. By these steps, we aim to design graph representations that minimize the information loss when transforming traces into graphs.