Reasoning-based Anomaly Detection Framework: A Real-time, Scalable, and Automated Approach to Anomaly Detection Across Domains

📅 2025-10-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Anomaly detection in large-scale distributed systems faces three core challenges: (1) real-time processing of high-throughput time-series data, (2) adaptive modeling of heterogeneous metrics across engineering, business, and operational domains, and (3) explainable root-cause localization lacking rigorous causal inference. This paper proposes a real-time, scalable end-to-end framework addressing these challenges. Methodologically, it introduces (1) an inference-driven hierarchical detection architecture integrating stream processing with multidimensional time-series modeling; (2) mSelect, an automated technique jointly optimizing algorithm selection and hyperparameter tuning; and (3) an integrated causal inference module enabling efficient, interpretable root-cause attribution. Evaluated on nine public benchmarks, the framework achieves state-of-the-art performance across five key metrics and attains AUC > 0.85 on seven—consistently outperforming mainstream approaches.

Technology Category

Application Category

📝 Abstract
Detecting anomalies in large, distributed systems presents several challenges. The first challenge arises from the sheer volume of data that needs to be processed. Flagging anomalies in a high-throughput environment calls for a careful consideration of both algorithm and system design. The second challenge comes from the heterogeneity of time-series datasets that leverage such a system in production. In practice, anomaly detection systems are rarely deployed for a single use case. Typically, there are several metrics to monitor, often across several domains (e.g. engineering, business and operations). A one-size-fits-all approach rarely works, so these systems need to be fine-tuned for every application - this is often done manually. The third challenge comes from the fact that determining the root-cause of anomalies in such settings is akin to finding a needle in a haystack. Identifying (in real time) a time-series dataset that is associated causally with the anomalous time-series data is a very difficult problem. In this paper, we describe a unified framework that addresses these challenges. Reasoning based Anomaly Detection Framework (RADF) is designed to perform real time anomaly detection on very large datasets. This framework employs a novel technique (mSelect) that automates the process of algorithm selection and hyper-parameter tuning for each use case. Finally, it incorporates a post-detection capability that allows for faster triaging and root-cause determination. Our extensive experiments demonstrate that RADF, powered by mSelect, surpasses state-of-the-art anomaly detection models in AUC performance for 5 out of 9 public benchmarking datasets. RADF achieved an AUC of over 0.85 for 7 out of 9 datasets, a distinction unmatched by any other state-of-the-art model.
Problem

Research questions and friction points this paper is trying to address.

Detecting anomalies in large distributed systems with high data volume
Automating algorithm selection and tuning for heterogeneous time-series datasets
Identifying root causes of anomalies in real-time across domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-time anomaly detection on large datasets
Automated algorithm selection and hyper-parameter tuning
Post-detection triaging and root-cause determination
🔎 Similar Papers
No similar papers found.
A
Anupam Panwar
Apple, Cupertino, California, USA
H
Himadri Pal
Apple, Cupertino, California, USA
Jiali Chen
Jiali Chen
Apple
Machine Learning
K
Kyle Cho
Apple, Cupertino, California, USA
R
Riddick Jiang
Apple, Cupertino, California, USA
M
Miao Zhao
Apple, Cupertino, California, USA
R
Rajiv Krishnamurthy
Apple, Cupertino, California, USA