🤖 AI Summary
Traditional anomaly detection methods fail on European XFEL accelerator control system logs due to low vocabulary diversity, absence of labels, and strong sequential dependencies. To address this, we propose a lightweight unsupervised framework: (1) sparse log entries are semantically embedded via lightweight word embeddings; (2) a dedicated Hidden Markov Model (HMM) is trained per control node to capture normal sequential behavior; and (3) anomalies are scored using a forward–backward probability ratio—requiring no log templates, manual labeling, or domain-specific priors. This work is the first to integrate lightweight word embeddings with node-level HMMs for modeling ultra-sparse industrial logs. Evaluated on real accelerator control node data, our method achieves high-precision anomaly detection and early warning, significantly improving operational responsiveness and supporting stable accelerator operation.
📝 Abstract
This article introduces a novel method for detecting anomalies within log data from control system nodes at the European XFEL accelerator. Effective anomaly detection is crucial for providing operators with a clear understanding of each node's availability, status, and potential problems, thereby ensuring smooth accelerator operation. Traditional and learning-based anomaly detection methods face significant limitations due to the sequential nature of these logs and the lack of a rich, node-specific text corpus. To address this, we propose an approach utilizing word embeddings to represent log entries and a Hidden Markov Model (HMM) to model the typical sequential patterns of these embeddings for individual nodes. Anomalies are identified by scoring individual log entries based on a probability ratio: this ratio compares the likelihood of the log sequence including the new entry against its likelihood without it, effectively measuring how well the new entry fits the established pattern. High scores indicate potential anomalies that deviate from the node's routine behavior. This method functions as a warning system, alerting operators to irregular log events that may signify underlying issues, thereby facilitating proactive intervention.