🤖 AI Summary
Existing predictive models lack robustness evaluation under sensor data faults such as noise, bias, missing values, or temporal misalignment, as standard benchmarks focus solely on nominal performance and fail to reflect real-world stability. This work proposes SensorFault-Bench, the first unified evaluation framework for sensor faults in cyber-physical systems, which enables fair comparison of zero-shot foundation models and diverse robustness methods through standardized fault injection, worst-case degradation metrics, and fault-timed mean squared error (MSE) that disentangles absolute error from relative robustness. Experiments reveal that models excelling on clean data can degrade significantly under faults—e.g., Chronos-2 performing worse than a naive predictor—and that different robustness approaches exhibit complementary strengths across value-type and availability-type fault scenarios, thereby validating the framework’s effectiveness and necessity.
📝 Abstract
Cyber-physical system (CPS) forecasting models depend on sensor streams with noisy, biased, missing, or temporally misaligned readings, yet standard forecasting evaluation often selects models by nominal error without showing whether they remain robust under such faults. We introduce SensorFault-Bench, a shared CPS-grounded sensor-fault stress-test protocol for evaluating forecasting architectures and robustness-improvement methods, and an operational taxonomy organizing the method comparison. Across four real-world datasets and eight scored scenarios governed by a standardized severity model, it reports worst-scenario degradation, clean mean squared error (MSE), and worst-scenario fault-time MSE, separating relative robustness from absolute error. A disjoint fault-transfer split lets explicit fault-training methods train on adjacent fault families while evaluation uses separate benchmark scenarios. Empirically, forecasting architectures favored by clean MSE can degrade sharply under faults, and clean-MSE rankings can disagree with worst-scenario fault-time error rankings. Chronos-2, the evaluated zero-shot foundation-model representative, matches or trails the last-value naive forecaster in clean MSE on the two single-target datasets and has the largest worst-scenario degradation on ETTh1 and Traffic, where all channels are forecast targets. For the evaluated robustness-improvement method set, paired deltas show selective degradation reductions: projected gradient descent adversarial training and randomized training lead where value faults dominate observed degradation, while fault augmentation leads where availability faults dominate. SensorFault-Bench provides open-source code, documented data access, and reproduction and extension guides, so new datasets, architectures, and robustness-improvement methods can be evaluated under the same CPS sensor-fault robustness protocol.