🤖 AI Summary
Existing anomaly detection benchmarks fail to account for the diversity of anomaly types and application contexts, hindering rigorous algorithm comparison and practical deployment. Method: We propose a scenario-driven evaluation paradigm: (1) constructing a generalizable taxonomy of anomaly scenarios spanning domains such as predictive maintenance and scientific discovery; (2) decoupling detection pipelines into end-to-end and modular component-level analyses; and (3) designing task-specific, interpretable evaluation metrics aligned with scenario objectives. Contribution/Results: Our framework identifies the root cause—benchmark agnosticism toward scenario semantics—that prevents traditional benchmarks from discriminating algorithmic performance meaningfully. It establishes the first evaluation framework for anomaly detection that jointly ensures real-world applicability and scientific rigor, thereby significantly enhancing the validity of algorithmic comparisons and the translational value of research findings.
📝 Abstract
Despite the continuous proposal of new anomaly detection algorithms and extensive benchmarking efforts, progress seems to stagnate, with only minor performance differences between established baselines and new algorithms. In this position paper, we argue that this stagnation is due to limitations in how we evaluate anomaly detection algorithms. Current benchmarking does not, for example, sufficiently reflect the diversity of anomalies in applications ranging from predictive maintenance to scientific discovery. Consequently, we need to rethink benchmarking in anomaly detection. In our opinion, anomaly detection should be studied using scenarios that capture the relevant characteristics of different applications. We identify three key areas for improvement: First, we need to identify anomaly detection scenarios based on a common taxonomy. Second, anomaly detection pipelines should be analyzed end-to-end and by component. Third, evaluating anomaly detection algorithms should be meaningful regarding the scenario's objectives.