We Need to Rethink Benchmarking in Anomaly Detection

📅 2025-07-21

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Existing anomaly detection benchmarks fail to account for the diversity of anomaly types and application contexts, hindering rigorous algorithm comparison and practical deployment. Method: We propose a scenario-driven evaluation paradigm: (1) constructing a generalizable taxonomy of anomaly scenarios spanning domains such as predictive maintenance and scientific discovery; (2) decoupling detection pipelines into end-to-end and modular component-level analyses; and (3) designing task-specific, interpretable evaluation metrics aligned with scenario objectives. Contribution/Results: Our framework identifies the root cause—benchmark agnosticism toward scenario semantics—that prevents traditional benchmarks from discriminating algorithmic performance meaningfully. It establishes the first evaluation framework for anomaly detection that jointly ensures real-world applicability and scientific rigor, thereby significantly enhancing the validity of algorithmic comparisons and the translational value of research findings.

Technology Category

Application Category

📝 Abstract

Despite the continuous proposal of new anomaly detection algorithms and extensive benchmarking efforts, progress seems to stagnate, with only minor performance differences between established baselines and new algorithms. In this position paper, we argue that this stagnation is due to limitations in how we evaluate anomaly detection algorithms. Current benchmarking does not, for example, sufficiently reflect the diversity of anomalies in applications ranging from predictive maintenance to scientific discovery. Consequently, we need to rethink benchmarking in anomaly detection. In our opinion, anomaly detection should be studied using scenarios that capture the relevant characteristics of different applications. We identify three key areas for improvement: First, we need to identify anomaly detection scenarios based on a common taxonomy. Second, anomaly detection pipelines should be analyzed end-to-end and by component. Third, evaluating anomaly detection algorithms should be meaningful regarding the scenario's objectives.

Problem

Research questions and friction points this paper is trying to address.

Rethinking benchmarking due to stagnant anomaly detection progress

Current benchmarks lack diversity in real-world anomaly scenarios

Improving evaluation with taxonomy-based scenarios and end-to-end analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Identify scenarios using common taxonomy

Analyze pipelines end-to-end and by component

Evaluate algorithms based on scenario objectives

🔎 Similar Papers

No similar papers found.