Mitigating multiple single-event upsets during deep neural network inference using fault-aware training

📅 2025-02-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep neural networks (DNNs) deployed in high-radiation environments suffer inference failures due to multi-bit single-event upsets (SEUs), which corrupt multiple bits simultaneously. Existing fault-tolerance techniques often require hardware modifications or fail to model SEU propagation across DNN layers. Method: This paper proposes Fault-Aware Training (FAT), a hardware-agnostic training framework that systematically models SEU propagation across layers and integrates end-to-end differentiable fault injection. FAT explicitly injects multi-point faults during training and introduces gradient-based fault-masking regularization. It further combines weight sensitivity analysis with adversarial robustness training to enhance resilience. Contribution/Results: Evaluated on CIFAR-10 and ImageNet, FAT improves multi-SEU tolerance by up to 3× over baseline methods, significantly mitigates accuracy degradation under radiation-induced faults, and incurs no additional inference latency or hardware overhead.

Technology Category

Application Category

📝 Abstract
Deep neural networks (DNNs) are increasingly used in safety-critical applications. Reliable fault analysis and mitigation are essential to ensure their functionality in harsh environments that contain high radiation levels. This study analyses the impact of multiple single-bit single-event upsets in DNNs by performing fault injection at the level of a DNN model. Additionally, a fault aware training (FAT) methodology is proposed that improves the DNNs' robustness to faults without any modification to the hardware. Experimental results show that the FAT methodology improves the tolerance to faults up to a factor 3.
Problem

Research questions and friction points this paper is trying to address.

Enhancing DNN resilience to single-event upsets
Implementing fault-aware training without hardware changes
Improving DNN robustness in high-radiation environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fault-aware training for DNNs
No hardware modification needed
Improves fault tolerance significantly
🔎 Similar Papers
No similar papers found.
T
Toon Vinck
Magics Technologies, Cipalstraat 3, 2440 Geel, Belgium; ESAT-ADVISE, KU Leuven, Kleinhoefstraat 4, 2440 Geel, Belgium; Dept. of Computer Science, Leuven.AI, KU Leuven, Celestijnenlaan 200a, 3001 Leuven, Belgium
N
N. Jonckers
Magics Technologies, Cipalstraat 3, 2440 Geel, Belgium; ESAT-ADVISE, KU Leuven, Kleinhoefstraat 4, 2440 Geel, Belgium
G
Gert Dekkers
Magics Technologies, Cipalstraat 3, 2440 Geel, Belgium
J
J. Prinzie
ESAT-ADVISE, KU Leuven, Kleinhoefstraat 4, 2440 Geel, Belgium
Peter Karsmakers
Peter Karsmakers
KU Leuven
machine learningdigital signalprocessingbiomedical technology