🤖 AI Summary
This work addresses the significant performance and energy efficiency degradation in existing DNN accelerators caused by security mechanisms that incur high hardware overhead and frequent off-chip memory accesses. To overcome these limitations, the authors propose a hardware-software co-designed secure memory protection framework that introduces a novel bandwidth-aware dynamic encryption granularity strategy. This approach leverages sliding-window overlap analysis to eliminate redundant memory accesses induced by intra- and inter-layer tiling, and incorporates a multi-level memory authentication mechanism to minimize unnecessary off-chip communication. Experimental evaluations on both server-class and edge NPUs demonstrate that the proposed method achieves strong security guarantees while reducing performance overhead by over 12% and improving energy efficiency by up to 87%, with excellent scalability across diverse architectures.
📝 Abstract
The rapid deployment of deep neural network (DNN) accelerators in safety-critical domains such as autonomous vehicles, healthcare systems, and financial infrastructure necessitates robust mechanisms to safeguard data confidentiality and computational integrity. Existing security solutions for DNN accelerators, however, suffer from excessive hardware resource demands and frequent off-chip memory access overheads, which degrade performance and scalability. To address these challenges, this paper presents a secure and efficient memory protection framework for DNN accelerators with minimal overhead. First, we propose a bandwidth-aware cryptographic scheme that adapts encryption granularity based on memory traffic patterns, striking a balance between security and resource efficiency. Second, we observe that both the overlapping regions in the intra-layer tiling's sliding window pattern and those resulting from inter-layer tiling strategy discrepancies introduce substantial redundant memory accesses and repeated computational overhead in cryptography. Third, we introduce a multi-level authentication mechanism that effectively eliminates unnecessary off-chip memory accesses, enhancing performance and energy efficiency. Experimental results show that this work decreases performance overhead by over 12% and achieves 87% energy efficiency improvement for both server and edge neural processing units (NPUs), while ensuring robust scalability.