Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical Systems

📅 2025-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep learning (DL) predictive models in industrial cyber-physical systems (CPS) suffer from insufficient robustness, and existing evaluation methods fail to reflect realistic perturbation scenarios. Method: We propose a practical, distributionally robust definition and a systematic evaluation framework. For the first time, we model real-world perturbations—including sensor drift, measurement noise, and irregular sampling—and integrate time-series perturbation generation, distributional shift quantification, and comparative evaluation across diverse DL architectures (RNN, CNN, Transformer, SSM). We establish the first robustness benchmark featuring multi-source real-world CPS time-series data and supporting reproducible assessment. Contribution/Results: We introduce a standardized robustness scoring mechanism for industrial CPS, deliver a ranked robustness comparison of mainstream models with root-cause attribution analysis, and open-source a comprehensive toolchain that significantly enhances the reliability of model selection and architecture design.

Technology Category

Application Category

📝 Abstract
Cyber-Physical Systems (CPS) in domains such as manufacturing and energy distribution generate complex time series data crucial for Prognostics and Health Management (PHM). While Deep Learning (DL) methods have demonstrated strong forecasting capabilities, their adoption in industrial CPS remains limited due insufficient robustness. Existing robustness evaluations primarily focus on formal verification or adversarial perturbations, inadequately representing the complexities encountered in real-world CPS scenarios. To address this, we introduce a practical robustness definition grounded in distributional robustness, explicitly tailored to industrial CPS, and propose a systematic framework for robustness evaluation. Our framework simulates realistic disturbances, such as sensor drift, noise and irregular sampling, enabling thorough robustness analyses of forecasting models on real-world CPS datasets. The robustness definition provides a standardized score to quantify and compare model performance across diverse datasets, assisting in informed model selection and architecture design. Through extensive empirical studies evaluating prominent DL architectures (including recurrent, convolutional, attention-based, modular, and structured state-space models) we demonstrate the applicability and effectiveness of our approach. We publicly release our robustness benchmark to encourage further research and reproducibility.
Problem

Research questions and friction points this paper is trying to address.

Evaluating robustness of DL forecasting in CPS
Addressing lack of realistic robustness benchmarks
Standardizing robustness scoring for model comparison
Innovation

Methods, ideas, or system contributions that make the work stand out.

Defines robustness via distributional robustness for CPS
Simulates realistic disturbances for thorough model analysis
Provides standardized score for model comparison and selection
🔎 Similar Papers
No similar papers found.
Alexander Windmann
Alexander Windmann
Helmut Schmidt Universitiy
AI SafetyRobustnessOOD GeneralizationAI Quality AssuranceIndustrial AI
H
H. Steude
Institute of Artificial Intelligence, Helmut Schmidt University, Hamburg, Germany
D
Daniel Boschmann
Institute of Artificial Intelligence, Helmut Schmidt University, Hamburg, Germany
Oliver Niggemann
Oliver Niggemann
Helmut-Schmidt-Universität / Universität der Bundeswehr Hamburg
Artificial IntelligenceMachine LearningAutomationDiagnosisProduction