Formally Exploring Time-Series Anomaly Detection Evaluation Metrics

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Time-series anomaly detection lacks reliable evaluation metrics; among 37 widely used metrics, none simultaneously satisfy verifiable core properties—such as sensitivity, robustness, and monotonicity—leading to distorted performance assessment and posing risks to safety-critical systems. Method: This work establishes the first theoretically grounded, verifiability-oriented framework for time-series anomaly detection evaluation, systematically exposing fundamental flaws in existing metrics. Building on this analysis, we propose LARM—the first metric provably satisfying all core properties—and its enhanced variant ALARM, supported by formal modeling, rigorous theoretical proofs, and extensive empirical validation across diverse benchmarks. Contribution/Results: LARM and ALARM achieve significantly superior accuracy, consistency, and cross-method comparability compared to state-of-the-art metrics. They provide both a unified theoretical foundation and a practical, deployable tool for trustworthy anomaly detection evaluation.

Technology Category

Application Category

📝 Abstract
Undetected anomalies in time series can trigger catastrophic failures in safety-critical systems, such as chemical plant explosions or power grid outages. Although many detection methods have been proposed, their performance remains unclear because current metrics capture only narrow aspects of the task and often yield misleading results. We address this issue by introducing verifiable properties that formalize essential requirements for evaluating time-series anomaly detection. These properties enable a theoretical framework that supports principled evaluations and reliable comparisons. Analyzing 37 widely used metrics, we show that most satisfy only a few properties, and none satisfy all, explaining persistent inconsistencies in prior results. To close this gap, we propose LARM, a flexible metric that provably satisfies all properties, and extend it to ALARM, an advanced variant meeting stricter requirements.
Problem

Research questions and friction points this paper is trying to address.

Evaluating time-series anomaly detection metrics lacks reliability
Current metrics yield misleading results and inconsistencies
Proposing verifiable properties for principled evaluation framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing verifiable properties for evaluation framework
Proposing LARM metric satisfying all formal properties
Extending to ALARM variant for stricter requirements
🔎 Similar Papers
No similar papers found.
D
Dennis Wagner
Department of Machine Learning, University of Kaiserslautern-Landau, Kaiserslautern, Germany
A
Arjun Nair
Department of Machine Learning, University of Kaiserslautern-Landau, Kaiserslautern, Germany
Billy Joe Franks
Billy Joe Franks
Department of Machine Learning, University of Kaiserslautern-Landau, Kaiserslautern, Germany
J
Justus Arweiler
Laboratory of Engineering Thermodynamics, University of Kaiserslautern-Landau, Kaiserslautern, Germany
A
Aparna Muraleedharan
Department of Chemical Process Engineering, Technical University of Munich, Straubing, Germany
I
Indra Jungjohann
Laboratory of Engineering Thermodynamics, University of Kaiserslautern-Landau, Kaiserslautern, Germany
F
Fabian Hartung
BASF SE, Ludwigshafen am Rhein, Germany
M
Mayank C. Ahuja
Department of Machine Learning, University of Kaiserslautern-Landau, Kaiserslautern, Germany
A
Andriy Balinskyy
Department of Machine Learning, University of Kaiserslautern-Landau, Kaiserslautern, Germany
Saurabh Varshneya
Saurabh Varshneya
University of Kaiserslautern-Landau
Interpretable Machine LearningMultimodal LearningImage Processing
N
Nabeel Hussain Syed
Department of Machine Learning, University of Kaiserslautern-Landau, Kaiserslautern, Germany
Mayank Nagda
Mayank Nagda
Research Associate at RPTU Kaiserslautern-Landau
Machine LearningArtificial IntelligenceNatural Language ProcessingComputer Vision
P
Phillip Liznerski
Department of Machine Learning, University of Kaiserslautern-Landau, Kaiserslautern, Germany
S
Steffen Reithermann
Department of Machine Learning, University of Kaiserslautern-Landau, Kaiserslautern, Germany
M
Maja Rudolph
Data Science Institute, University of Wisconsin-Madison, Madison, USA
S
Sebastian Vollmer
Department of Machine Learning, University of Kaiserslautern-Landau, Kaiserslautern, Germany
R
Ralf Schulz
Department of Natural and Environmental Science, University of Kaiserslautern-Landau, Landau, Germany
T
Torsten Katz
BASF SE, Ludwigshafen am Rhein, Germany
Stephan Mandt
Stephan Mandt
Associate Professor, University of California, Irvine
Artificial IntelligenceMachine LearningCompressionAI for ScienceGenerative Models
M
Michael Bortz
Department of Optimization, Fraunhofer Institute for Industrial Mathematics, Kaiserslautern, Germany
Heike Leitte
Heike Leitte
Professor of Computer Science, TU Kaiserslautern
VisualizationVisual AnalyticsData Science
Daniel Neider
Daniel Neider
TU Dortmund University and Center for Trustworthy Data Science and Security
Formal MethodsMachine LearningLogicArtificial Intelligence
Jakob Burger
Jakob Burger
Technical University of Munich
Synthetic FuelsOptimisationRaw Material ChangeC1 chemistryBiotechnology
Fabian Jirasek
Fabian Jirasek
Laboratory of Engineering Themodynamics (LTD), RPTU Kaiserslautern
Chemical EngineeringBioprocess EngineeringThermodynamicsMachine Learning
Hans Hasse
Hans Hasse
University of Kaiserslautern
Chemical Engineering