Adversarial Vulnerabilities in Neural Operator Digital Twins: Gradient-Free Attacks on Nuclear Thermal-Hydraulic Surrogates

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

The adversarial robustness of neural operators in digital twins for nuclear thermal-hydraulics remains insufficiently evaluated, posing potential safety risks. This work systematically investigates their sensitivity to extremely sparse, physically realizable input perturbations using gradient-free optimization methods such as differential evolution. We propose a dual-factor vulnerability model that integrates effective perturbation dimensionality and sensitivity, revealing architectural differences in exploitability. Notably, we demonstrate for the first time that boundary condition perturbations can completely evade z-score–based anomaly detection with 100% success. Experiments show that altering fewer than 1% of input components can escalate the relative L2 prediction error from 1.5% to 37–63%, underscoring severe vulnerabilities in safety-critical applications.

Technology Category

Application Category

📝 Abstract

Operator learning models are rapidly emerging as the predictive core of digital twins for nuclear and energy systems, promising real-time field reconstruction from sparse sensor measurements. Yet their robustness to adversarial perturbations remains uncharacterized, a critical gap for deployment in safety-critical systems. Here we show that neural operators are acutely vulnerable to extremely sparse (fewer than 1% of inputs), physically plausible perturbations that exploit their sensitivity to boundary conditions. Using gradient-free differential evolution across four operator architectures, we demonstrate that minimal modifications trigger catastrophic prediction failures, increasing relative $L_2$ error from $\sim$1.5% (validated accuracy) to 37-63% while remaining completely undetectable by standard validation metrics. Notably, 100% of successful single-point attacks pass z-score anomaly detection. We introduce the effective perturbation dimension $d_{\text{eff}}$, a Jacobian-based diagnostic that, together with sensitivity magnitude, yields a two-factor vulnerability model explaining why architectures with extreme sensitivity concentration (POD-DeepONet, $d_{\text{eff}} \approx 1$) are not necessarily the most exploitable, since low-rank output projections cap maximum error, while moderate concentration with sufficient amplification (S-DeepONet, $d_{\text{eff}} \approx 4$) produces the highest attack success. Gradient-free search outperforms gradient-based alternatives (PGD) on architectures with gradient pathologies, while random perturbations of equal magnitude achieve near-zero success rates, confirming that the discovered vulnerabilities are structural. Our findings expose a previously overlooked attack surface in operator learning models and establish that these models require robustness guarantees beyond standard validation before deployment.

Problem

Research questions and friction points this paper is trying to address.

adversarial vulnerability

neural operators

digital twins

nuclear thermal-hydraulics

gradient-free attacks

Innovation

Methods, ideas, or system contributions that make the work stand out.

neural operators

adversarial attacks

gradient-free optimization