🤖 AI Summary
Deep learning (DL) predictive models in industrial cyber-physical systems (CPS) suffer from insufficient robustness, and existing evaluation methods fail to reflect realistic perturbation scenarios. Method: We propose a practical, distributionally robust definition and a systematic evaluation framework. For the first time, we model real-world perturbations—including sensor drift, measurement noise, and irregular sampling—and integrate time-series perturbation generation, distributional shift quantification, and comparative evaluation across diverse DL architectures (RNN, CNN, Transformer, SSM). We establish the first robustness benchmark featuring multi-source real-world CPS time-series data and supporting reproducible assessment. Contribution/Results: We introduce a standardized robustness scoring mechanism for industrial CPS, deliver a ranked robustness comparison of mainstream models with root-cause attribution analysis, and open-source a comprehensive toolchain that significantly enhances the reliability of model selection and architecture design.
📝 Abstract
Cyber-Physical Systems (CPS) in domains such as manufacturing and energy distribution generate complex time series data crucial for Prognostics and Health Management (PHM). While Deep Learning (DL) methods have demonstrated strong forecasting capabilities, their adoption in industrial CPS remains limited due insufficient robustness. Existing robustness evaluations primarily focus on formal verification or adversarial perturbations, inadequately representing the complexities encountered in real-world CPS scenarios. To address this, we introduce a practical robustness definition grounded in distributional robustness, explicitly tailored to industrial CPS, and propose a systematic framework for robustness evaluation. Our framework simulates realistic disturbances, such as sensor drift, noise and irregular sampling, enabling thorough robustness analyses of forecasting models on real-world CPS datasets. The robustness definition provides a standardized score to quantify and compare model performance across diverse datasets, assisting in informed model selection and architecture design. Through extensive empirical studies evaluating prominent DL architectures (including recurrent, convolutional, attention-based, modular, and structured state-space models) we demonstrate the applicability and effectiveness of our approach. We publicly release our robustness benchmark to encourage further research and reproducibility.