đ€ AI Summary
This work addresses the lack of a standardized evaluation framework in time series anomaly detection, where existing metrics and thresholding strategies often lead to misleading performance assessments. To this end, we propose GNN-TSAD, the first open-source framework for graph neural network (GNN)-based time series anomaly detection, supporting multiple datasets, flexible graph configurations, and reproducible evaluation protocols. Built around a GNN backbone, our approach integrates reconstruction or prediction errors, adaptive thresholding, and attention mechanisms to effectively handle graph structural uncertainty. Experiments on two real-world datasets demonstrate significant improvements in both detection performance and interpretability. Notably, attention-based GNN variants exhibit robustness under incomplete graph structures, while our analysis highlights how common evaluation practicesâparticularly metric selection and thresholdingâcan distort result interpretation.
đ Abstract
There is growing interest in applying graph-based methods to Time Series Anomaly Detection (TSAD), particularly Graph Neural Networks (GNNs), as they naturally model dependencies among multivariate signals. GNNs are typically used as backbones in score-based TSAD pipelines, where anomalies are identified through reconstruction or prediction errors followed by thresholding. However, and despite promising results, the field still lacks standardized frameworks for evaluation and suffers from persistent issues with metric design and interpretation. We thus present an open-source framework for TSAD using GNNs, designed to support reproducible experimentation across datasets, graph structures, and evaluation strategies. Built with flexibility and extensibility in mind, the framework facilitates systematic comparisons between TSAD models and enables in-depth analysis of performance and interpretability. Using this tool, we evaluate several GNN-based architectures alongside baseline models across two real-world datasets with contrasting structural characteristics. Our results show that GNNs not only improve detection performance but also offer significant gains in interpretability, an especially valuable feature for practical diagnosis. We also find that attention-based GNNs offer robustness when graph structure is uncertain or inferred. In addition, we reflect on common evaluation practices in TSAD, showing how certain metrics and thresholding strategies can obscure meaningful comparisons. Overall, this work contributes both practical tools and critical insights to advance the development and evaluation of graph-based TSAD systems.