🤖 AI Summary
Existing anomaly detection methods suffer from a dual limitation: offline approaches cannot process data streams in real time, while online methods typically rely on periodic retraining or storage of historical data. This paper introduces Online-iForest—the first fully online variant of Isolation Forest—capable of incremental updates via a single pass over streaming data, without retaining historical samples or performing retraining. Its core innovations include a dynamic construction mechanism for random-split trees, a node-weight decay strategy, and synergistic modeling of concept drift using a sliding time window. Evaluated on real-world streaming datasets, Online-iForest achieves detection accuracy comparable to state-of-the-art offline methods, while significantly outperforming all online baselines in inference speed. It demonstrates exceptional efficiency and robustness in low-latency applications such as network security and fraud detection.
📝 Abstract
The anomaly detection literature is abundant with offline methods, which require repeated access to data in memory, and impose impractical assumptions when applied to a streaming context. Existing online anomaly detection methods also generally fail to address these constraints, resorting to periodic retraining to adapt to the online context. We propose Online-iForest, a novel method explicitly designed for streaming conditions that seamlessly tracks the data generating process as it evolves over time. Experimental validation on real-world datasets demonstrated that Online-iForest is on par with online alternatives and closely rivals state-of-the-art offline anomaly detection techniques that undergo periodic retraining. Notably, Online-iForest consistently outperforms all competitors in terms of efficiency, making it a promising solution in applications where fast identification of anomalies is of primary importance such as cybersecurity, fraud and fault detection.