IDK-S: Incremental Distributional Kernel for Streaming Anomaly Detection

๐Ÿ“… 2025-12-05
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Data stream anomaly detection demands both high accuracy and real-time processing under continuously evolving data distributionsโ€”a challenge unmet by existing methods. This paper proposes a dynamic incremental detection framework based on kernel mean embedding (KME), the first to jointly integrate data-dependent kernels with inheritance-based isolation distribution modeling. It introduces a lightweight incremental update mechanism, theoretically guaranteeing statistical equivalence to full retraining. Crucially, the method requires no historical data storage and processes streams in a single pass. Extensive experiments across 13 standard benchmark datasets demonstrate that our approach achieves significantly higher average detection accuracy than state-of-the-art methods, while accelerating inference by approximately 9ร—. It further exhibits strong robustness to concept drift and incurs low computational overhead, striking an unprecedented balance between accuracy, efficiency, and adaptability in streaming settings.

Technology Category

Application Category

๐Ÿ“ Abstract
Anomaly detection on data streams presents significant challenges, requiring methods to maintain high detection accuracy among evolving distributions while ensuring real-time efficiency. Here we introduce $mathcal{IDK}$-$mathcal{S}$, a novel $mathbf{I}$ncremental $mathbf{D}$istributional $mathbf{K}$ernel for $mathbf{S}$treaming anomaly detection that effectively addresses these challenges by creating a new dynamic representation in the kernel mean embedding framework. The superiority of $mathcal{IDK}$-$mathcal{S}$ is attributed to two key innovations. First, it inherits the strengths of the Isolation Distributional Kernel, an offline detector that has demonstrated significant performance advantages over foundational methods like Isolation Forest and Local Outlier Factor due to the use of a data-dependent kernel. Second, it adopts a lightweight incremental update mechanism that significantly reduces computational overhead compared to the naive baseline strategy of performing a full model retraining. This is achieved without compromising detection accuracy, a claim supported by its statistical equivalence to the full retrained model. Our extensive experiments on thirteen benchmarks demonstrate that $mathcal{IDK}$-$mathcal{S}$ achieves superior detection accuracy while operating substantially faster, in many cases by an order of magnitude, than existing state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Detects anomalies in evolving data streams efficiently
Maintains high accuracy without full model retraining
Reduces computational overhead while ensuring real-time performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Incremental Distributional Kernel for streaming anomaly detection
Lightweight incremental update mechanism reduces computational overhead
Maintains detection accuracy without full model retraining
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yang Xu
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Y
Yixiao Ma
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Kaifeng Zhang
Kaifeng Zhang
Columbia University
RoboticsPhysics SimulationMachine LearningComputer Vision
Z
Zuliang Yang
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Kai Ming Ting
Kai Ming Ting
Nanjing University
Machine LearningData Mining