TAB: Unified Benchmarking of Time Series Anomaly Detection Methods

📅 2025-06-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current time-series anomaly detection (TSAD) research suffers from the absence of a unified, reproducible evaluation benchmark, hindering fair and systematic method comparison. To address this, we introduce TAB—the first open-source, large-scale, multi-paradigm TSAD benchmark. TAB encompasses 29 publicly available multivariate datasets and 1,635 univariate series, enabling cross-domain and cross-scenario evaluation. It provides the first standardized evaluation framework supporting non-learned, machine learning, deep learning, pre-trained models, and large language model–based approaches. Employing consistent data splits, evaluation metrics, and hyperparameter protocols, TAB systematically evaluates over 50 state-of-the-art TSAD algorithms, revealing their performance boundaries across diverse anomaly types and data characteristics. All datasets, source code, and experimental results are publicly released, establishing a reliable, extensible foundation for rigorous TSAD research and development.

Technology Category

Application Category

📝 Abstract
Time series anomaly detection (TSAD) plays an important role in many domains such as finance, transportation, and healthcare. With the ongoing instrumentation of reality, more time series data will be available, leading also to growing demands for TSAD. While many TSAD methods already exist, new and better methods are still desirable. However, effective progress hinges on the availability of reliable means of evaluating new methods and comparing them with existing methods. We address deficiencies in current evaluation procedures related to datasets and experimental settings and protocols. Specifically, we propose a new time series anomaly detection benchmark, called TAB. First, TAB encompasses 29 public multivariate datasets and 1,635 univariate time series from different domains to facilitate more comprehensive evaluations on diverse datasets. Second, TAB covers a variety of TSAD methods, including Non-learning, Machine learning, Deep learning, LLM-based, and Time-series pre-trained methods. Third, TAB features a unified and automated evaluation pipeline that enables fair and easy evaluation of TSAD methods. Finally, we employ TAB to evaluate existing TSAD methods and report on the outcomes, thereby offering a deeper insight into the performance of these methods. Besides, all datasets and code are available at https://github.com/decisionintelligence/TAB.
Problem

Research questions and friction points this paper is trying to address.

Address deficiencies in current TSAD evaluation procedures
Propose a unified benchmark for diverse TSAD methods
Enable fair and automated evaluation of TSAD performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified benchmark for diverse TSAD datasets
Automated pipeline for fair method evaluation
Comprehensive coverage of TSAD techniques
🔎 Similar Papers
No similar papers found.