Formally Exploring Time-Series Anomaly Detection Evaluation Metrics

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Time-series anomaly detection lacks reliable evaluation metrics; among 37 widely used metrics, none simultaneously satisfy verifiable core properties—such as sensitivity, robustness, and monotonicity—leading to distorted performance assessment and posing risks to safety-critical systems. Method: This work establishes the first theoretically grounded, verifiability-oriented framework for time-series anomaly detection evaluation, systematically exposing fundamental flaws in existing metrics. Building on this analysis, we propose LARM—the first metric provably satisfying all core properties—and its enhanced variant ALARM, supported by formal modeling, rigorous theoretical proofs, and extensive empirical validation across diverse benchmarks. Contribution/Results: LARM and ALARM achieve significantly superior accuracy, consistency, and cross-method comparability compared to state-of-the-art metrics. They provide both a unified theoretical foundation and a practical, deployable tool for trustworthy anomaly detection evaluation.

Technology Category

Application Category

📝 Abstract

Undetected anomalies in time series can trigger catastrophic failures in safety-critical systems, such as chemical plant explosions or power grid outages. Although many detection methods have been proposed, their performance remains unclear because current metrics capture only narrow aspects of the task and often yield misleading results. We address this issue by introducing verifiable properties that formalize essential requirements for evaluating time-series anomaly detection. These properties enable a theoretical framework that supports principled evaluations and reliable comparisons. Analyzing 37 widely used metrics, we show that most satisfy only a few properties, and none satisfy all, explaining persistent inconsistencies in prior results. To close this gap, we propose LARM, a flexible metric that provably satisfies all properties, and extend it to ALARM, an advanced variant meeting stricter requirements.

Problem

Research questions and friction points this paper is trying to address.

Evaluating time-series anomaly detection metrics lacks reliability

Current metrics yield misleading results and inconsistencies

Proposing verifiable properties for principled evaluation framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing verifiable properties for evaluation framework

Proposing LARM metric satisfying all formal properties

Extending to ALARM variant for stricter requirements

🔎 Similar Papers

No similar papers found.