🤖 AI Summary
Addressing the challenge of balancing energy efficiency and accuracy in low-power DNN hardware acceleration, this paper proposes a three-level collaborative approximation method—operating at the layer, filter, and convolution kernel granularities—and introduces the first dynamic, multi-granularity deployment of ROU-P approximate multipliers, precisely aligned with the error-resilience characteristics of individual DNN layers. Leveraging quantization-aware approximate modeling and fine-grained error distribution optimization, the approach overcomes the limitations of conventional coarse-grained, single-level approximation. Experimental evaluation on ResNet-8/CIFAR-10 demonstrates a 54% energy-efficiency improvement over the baseline quantized model (with ≤4% accuracy loss) and achieves twice the energy efficiency of state-of-the-art DNN approximation methods while maintaining higher accuracy. This work establishes a novel co-design paradigm integrating approximate computing with neural network architecture optimization.
📝 Abstract
Nowadays, the rapid growth of Deep Neural Network (DNN) architectures has established them as the defacto approach for providing advanced Machine Learning tasks with excellent accuracy. Targeting low-power DNN computing, this paper examines the interplay of fine-grained error resilience of DNN workloads in collaboration with hardware approximation techniques, to achieve higher levels of energy efficiency. Utilizing the state-of-the-art ROUP approximate multipliers, we systematically explore their fine-grained distribution across the network according to our layer-, filter-, and kernel-level approaches, and examine their impact on accuracy and energy. We use the ResNet-8 model on the CIFAR-10 dataset to evaluate our approximations. The proposed solution delivers up to 54% energy gains in exchange for up to 4% accuracy loss, compared to the baseline quantized model, while it provides 2x energy gains with better accuracy versus the state-of-the-art DNN approximations.