π€ AI Summary
This study investigates the robustness of deep neural networks deployed on edge devices against hardware-induced bit-flip errors in model parameters. Through theoretical modeling and empirical analysis, it systematically evaluates how numerical precision, network depth, sparsity, and bounded activations influence fault tolerance, with a particular focus on the intrinsic resilience of logic- and lookup table (LUT)-based architectures. The work reframes fault tolerance as an inherent architectural property, revealing that LUT networks exhibit a synergistic advantage under low-precision and high-sparsity conditions, alongside a unique even-layer recovery effect in logic-based designs. Validated via an expected mean squared error model, multi-format comparisons, and MLPerf Tiny ablation studies, the findings demonstrate that shallow, sparse, low-precision networks with bounded activations significantly enhance robustness; notably, LUT models maintain stability under severe noise, achieving an excellent trade-off between accuracy and resilience.
π Abstract
The deployment of deep neural networks (DNNs) in safety-critical edge environments necessitates robustness against hardware-induced bit-flip errors. While empirical studies indicate that reducing numerical precision can improve fault tolerance, the theoretical basis of this phenomenon remains underexplored. In this work, we study resilience as a structural property of neural architectures rather than solely as a property of a dataset-specific trained solution. By deriving the expected squared error (MSE) under independent parameter bit flips across multiple numerical formats and layer primitives, we show that lower precision, higher sparsity, bounded activations, and shallow depth are consistently favored under this corruption model. We then argue that logic and lookup-based neural networks realize the joint limit of these design trends. Through ablation studies on the MLPerf Tiny benchmark suite, we show that the observed empirical trends are consistent with the theoretical predictions, and that LUT-based models remain highly stable in corruption regimes where standard floating-point models fail sharply. Furthermore, we identify a novel even-layer recovery effect unique to logic-based architectures and analyze the structural conditions under which it emerges. Overall, our results suggest that shifting from continuous arithmetic weights to discrete Boolean lookups can provide a favorable accuracy-resilience trade-off for hardware fault tolerance.