IDK-S: Incremental Distributional Kernel for Streaming Anomaly Detection

📅 2025-12-05

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Data stream anomaly detection demands both high accuracy and real-time processing under continuously evolving data distributions—a challenge unmet by existing methods. This paper proposes a dynamic incremental detection framework based on kernel mean embedding (KME), the first to jointly integrate data-dependent kernels with inheritance-based isolation distribution modeling. It introduces a lightweight incremental update mechanism, theoretically guaranteeing statistical equivalence to full retraining. Crucially, the method requires no historical data storage and processes streams in a single pass. Extensive experiments across 13 standard benchmark datasets demonstrate that our approach achieves significantly higher average detection accuracy than state-of-the-art methods, while accelerating inference by approximately 9×. It further exhibits strong robustness to concept drift and incurs low computational overhead, striking an unprecedented balance between accuracy, efficiency, and adaptability in streaming settings.

Technology Category

Application Category

📝 Abstract

Anomaly detection on data streams presents significant challenges, requiring methods to maintain high detection accuracy among evolving distributions while ensuring real-time efficiency. Here we introduce $mathcal{IDK}$-$mathcal{S}$, a novel $mathbf{I}$ncremental $mathbf{D}$istributional $mathbf{K}$ernel for $mathbf{S}$treaming anomaly detection that effectively addresses these challenges by creating a new dynamic representation in the kernel mean embedding framework. The superiority of $mathcal{IDK}$-$mathcal{S}$ is attributed to two key innovations. First, it inherits the strengths of the Isolation Distributional Kernel, an offline detector that has demonstrated significant performance advantages over foundational methods like Isolation Forest and Local Outlier Factor due to the use of a data-dependent kernel. Second, it adopts a lightweight incremental update mechanism that significantly reduces computational overhead compared to the naive baseline strategy of performing a full model retraining. This is achieved without compromising detection accuracy, a claim supported by its statistical equivalence to the full retrained model. Our extensive experiments on thirteen benchmarks demonstrate that $mathcal{IDK}$-$mathcal{S}$ achieves superior detection accuracy while operating substantially faster, in many cases by an order of magnitude, than existing state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

Detects anomalies in evolving data streams efficiently

Maintains high accuracy without full model retraining

Reduces computational overhead while ensuring real-time performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Incremental Distributional Kernel for streaming anomaly detection

Lightweight incremental update mechanism reduces computational overhead

Maintains detection accuracy without full model retraining

🔎 Similar Papers

No similar papers found.