🤖 AI Summary
This paper investigates how time-series cross-validation (TSCV) strategies affect the reliability of evaluating multivariate time-series (MTS) subsequence anomaly detection—particularly for fault detection—by examining the interplay between temporal dependency preservation and model generalization. We propose a hierarchical evaluation framework and systematically compare sliding-window versus forward-rolling TSCV strategies across multiple MTS fault datasets, jointly assessing models including Random Forest, LSTM, and TCN. Results show that sliding-window TSCV significantly improves median AUC-PR for deep models (+12.7% on average), reduces inter-fold performance variance (−41%), and enhances robustness to local temporal continuity; it also better preserves fault patterns under low fold counts. Crucially, we identify TSCV partitioning structure—not merely temporal ordering—as a key determinant of generalization performance. This work provides a methodological foundation for robust evaluation of MTS anomaly detection systems.
📝 Abstract
Evaluating anomaly detection in multivariate time series (MTS) requires careful consideration of temporal dependencies, particularly when detecting subsequence anomalies common in fault detection scenarios. While time series cross-validation (TSCV) techniques aim to preserve temporal ordering during model evaluation, their impact on classifier performance remains underexplored. This study systematically investigates the effect of TSCV strategy on the precision-recall characteristics of classifiers trained to detect fault-like anomalies in MTS datasets. We compare walk-forward (WF) and sliding window (SW) methods across a range of validation partition configurations and classifier types, including shallow learners and deep learning (DL) classifiers. Results show that SW consistently yields higher median AUC-PR scores and reduced fold-to-fold performance variance, particularly for deep architectures sensitive to localized temporal continuity. Furthermore, we find that classifier generalization is sensitive to the number and structure of temporal partitions, with overlapping windows preserving fault signatures more effectively at lower fold counts. A classifier-level stratified analysis reveals that certain algorithms, such as random forests (RF), maintain stable performance across validation schemes, whereas others exhibit marked sensitivity. This study demonstrates that TSCV design in benchmarking anomaly detection models on streaming time series and provide guidance for selecting evaluation strategies in temporally structured learning environments.