🤖 AI Summary
Detecting semantic anomalies in business processes—distinguishing genuinely unreasonable behavior from merely infrequent yet legitimate activities—remains challenging. Method: We propose the first end-to-end large language model (LLM)-based approach, uniquely processing full process traces (rather than fragmented event pairs) to preserve long-range dependencies and model global semantic plausibility. Built upon fine-tuned Llama-2 and trained on 143K annotated synthetic logs derived from real-world process models, our method jointly captures sequential and mutual-exclusivity semantic anomalies. It supports zero-shot cross-domain transfer and generates natural-language explanations for anomaly attribution. Results: Extensive experiments across multiple domains demonstrate significant improvements over state-of-the-art semantic anomaly detection baselines, confirming strong generalization capability and out-of-the-box usability.
📝 Abstract
Detecting anomalies in business processes is crucial for ensuring operational success. While many existing methods rely on statistical frequency to detect anomalies, it's important to note that infrequent behavior doesn't necessarily imply undesirability. To address this challenge, detecting anomalies from a semantic viewpoint proves to be a more effective approach. However, current semantic anomaly detection methods treat a trace (i.e., process instance) as multiple event pairs, disrupting long-distance dependencies. In this paper, we introduce DABL, a novel approach for detecting semantic anomalies in business processes using large language models (LLMs). We collect 143,137 real-world process models from various domains. By generating normal traces through the playout of these process models and simulating both ordering and exclusion anomalies, we fine-tune Llama 2 using the resulting log. Through extensive experiments, we demonstrate that DABL surpasses existing state-of-the-art semantic anomaly detection methods in terms of both generalization ability and learning of given processes. Users can directly apply DABL to detect semantic anomalies in their own datasets without the need for additional training. Furthermore, DABL offers the ability to interpret anomalies' causes in natural language, providing valuable insights into the detected anomalies.