Unsupervised Log Anomaly Detection with Few Unique Tokens

📅 2023-10-13

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Traditional anomaly detection methods fail on European XFEL accelerator control system logs due to low vocabulary diversity, absence of labels, and strong sequential dependencies. To address this, we propose a lightweight unsupervised framework: (1) sparse log entries are semantically embedded via lightweight word embeddings; (2) a dedicated Hidden Markov Model (HMM) is trained per control node to capture normal sequential behavior; and (3) anomalies are scored using a forward–backward probability ratio—requiring no log templates, manual labeling, or domain-specific priors. This work is the first to integrate lightweight word embeddings with node-level HMMs for modeling ultra-sparse industrial logs. Evaluated on real accelerator control node data, our method achieves high-precision anomaly detection and early warning, significantly improving operational responsiveness and supporting stable accelerator operation.

📝 Abstract

This article introduces a novel method for detecting anomalies within log data from control system nodes at the European XFEL accelerator. Effective anomaly detection is crucial for providing operators with a clear understanding of each node's availability, status, and potential problems, thereby ensuring smooth accelerator operation. Traditional and learning-based anomaly detection methods face significant limitations due to the sequential nature of these logs and the lack of a rich, node-specific text corpus. To address this, we propose an approach utilizing word embeddings to represent log entries and a Hidden Markov Model (HMM) to model the typical sequential patterns of these embeddings for individual nodes. Anomalies are identified by scoring individual log entries based on a probability ratio: this ratio compares the likelihood of the log sequence including the new entry against its likelihood without it, effectively measuring how well the new entry fits the established pattern. High scores indicate potential anomalies that deviate from the node's routine behavior. This method functions as a warning system, alerting operators to irregular log events that may signify underlying issues, thereby facilitating proactive intervention.

Problem

Research questions and friction points this paper is trying to address.

Detect anomalies in log data with few unique tokens

Overcome limitations of sequential logs and sparse text corpus

Identify deviations from normal node behavior using embeddings and HMM

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses word embeddings for log entry representation

Applies Hidden Markov Model for sequential patterns

Detects anomalies via probability ratio scoring

🔎 Similar Papers

No similar papers found.

Authors to Follow