🤖 AI Summary
To address functional failures in safety-critical deep neural networks (DNNs)—such as those in autonomous driving—caused by memory bit-flips in weights, this paper proposes SPW, a fault-tolerant architecture that synergistically integrates error-correcting codes (ECC) with weight masking. SPW precisely corrects single-bit errors while safely masking multi-bit errors, thereby overcoming the fundamental limitation of conventional ECC schemes, which support only single-bit correction. By jointly optimizing reliability and efficiency under high bit-error rates, SPW achieves over 300% improvement in model accuracy at a bit-error rate of 10⁻¹, with only 47.5% area overhead. Notably, SPW is the first approach to combine statistical fault injection with hardware-aware fault-tolerant design, enabling robust DNN deployment in harsh memory environments.
📝 Abstract
Deep Neural Network (DNN) has achieve great success in solving a wide range of machine learning problems. Recently, they have been deployed in datacenters (potentially for business-critical or industrial applications) and safety-critical systems such as self-driving cars. So, their correct functionality in the presence of potential bit-flip errors on DNN parameters stored in memories plays the key role in their applicability in safety-critical applications. In this paper, a fault tolerance approach based on Error Correcting Codes (ECC), called SPW, is proposed to ensure the correct functionality of DNNs in the presence of bit-flip faults. In the proposed approach, error occurrence is detected by the stored ECC and then, it is correct in case of a single-bit error or the weight is completely set to zero (i.e. masked) otherwise. A statistical fault injection campaign is proposed and utilized to investigate the efficacy of the proposed approach. The experimental results show that the accuracy of the DNN increases by more than 300% in the presence with Bit Error Rate of 10^(-1) in comparison to the case where ECC technique is applied, in expense of just 47.5% area overhead.