π€ AI Summary
The adversarial robustness of neural operators in digital twins for nuclear thermal-hydraulics remains insufficiently evaluated, posing potential safety risks. This work systematically investigates their sensitivity to extremely sparse, physically realizable input perturbations using gradient-free optimization methods such as differential evolution. We propose a dual-factor vulnerability model that integrates effective perturbation dimensionality and sensitivity, revealing architectural differences in exploitability. Notably, we demonstrate for the first time that boundary condition perturbations can completely evade z-scoreβbased anomaly detection with 100% success. Experiments show that altering fewer than 1% of input components can escalate the relative L2 prediction error from 1.5% to 37β63%, underscoring severe vulnerabilities in safety-critical applications.
π Abstract
Operator learning models are rapidly emerging as the predictive core of digital twins for nuclear and energy systems, promising real-time field reconstruction from sparse sensor measurements. Yet their robustness to adversarial perturbations remains uncharacterized, a critical gap for deployment in safety-critical systems. Here we show that neural operators are acutely vulnerable to extremely sparse (fewer than 1% of inputs), physically plausible perturbations that exploit their sensitivity to boundary conditions. Using gradient-free differential evolution across four operator architectures, we demonstrate that minimal modifications trigger catastrophic prediction failures, increasing relative $L_2$ error from $\sim$1.5% (validated accuracy) to 37-63% while remaining completely undetectable by standard validation metrics. Notably, 100% of successful single-point attacks pass z-score anomaly detection. We introduce the effective perturbation dimension $d_{\text{eff}}$, a Jacobian-based diagnostic that, together with sensitivity magnitude, yields a two-factor vulnerability model explaining why architectures with extreme sensitivity concentration (POD-DeepONet, $d_{\text{eff}} \approx 1$) are not necessarily the most exploitable, since low-rank output projections cap maximum error, while moderate concentration with sufficient amplification (S-DeepONet, $d_{\text{eff}} \approx 4$) produces the highest attack success. Gradient-free search outperforms gradient-based alternatives (PGD) on architectures with gradient pathologies, while random perturbations of equal magnitude achieve near-zero success rates, confirming that the discovered vulnerabilities are structural. Our findings expose a previously overlooked attack surface in operator learning models and establish that these models require robustness guarantees beyond standard validation before deployment.