🤖 AI Summary
To address the high hardware overhead of defending Transformer models against bit-flip attacks (BFAs), this work proposes the first FPGA-accelerated hardware architecture for the FaR (Forget and Rewire) defense method. Our architecture enables runtime resilience through dynamic activation-path rerouting, linear-layer parameter obfuscation, and low-latency configuration switching within reconfigurable logic; a lightweight storage mechanism further optimizes rerouting decisions to balance security and real-time performance. Experimental evaluation across multiple Transformer models demonstrates that our design reduces FaR inference latency by 42.3%–68.1%, improves energy efficiency by 3.1×, and fully preserves the original robustness against BFAs. This work bridges a critical gap between algorithm-level resilience techniques and their efficient hardware deployment.
📝 Abstract
Forget and Rewire (FaR) methodology has demonstrated strong resilience against Bit-Flip Attacks (BFAs) on Transformer-based models by obfuscating critical parameters through dynamic rewiring of linear layers. However, the application of FaR introduces non-negligible performance and memory overheads, primarily due to the runtime modification of activation pathways and the lack of hardware-level optimization. To overcome these limitations, we propose FaRAccel, a novel hardware accelerator architecture implemented on FPGA, specifically designed to offload and optimize FaR operations. FaRAccel integrates reconfigurable logic for dynamic activation rerouting, and lightweight storage of rewiring configurations, enabling low-latency inference with minimal energy overhead. We evaluate FaRAccel across a suite of Transformer models and demonstrate substantial reductions in FaR inference latency and improvement in energy efficiency, while maintaining the robustness gains of the original FaR methodology. To the best of our knowledge, this is the first hardware-accelerated defense against BFAs in Transformers, effectively bridging the gap between algorithmic resilience and efficient deployment on real-world AI platforms.