PoTAcc: A Pipeline for End-to-End Acceleration of Power-of-Two Quantized DNNs

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Existing power-of-two quantized deep neural networks (DNNs) lack efficient end-to-end deployment support on edge devices, and their hardware performance and energy efficiency have not been systematically evaluated. This work proposes PoTAcc—an open-source, end-to-end deployment framework built upon TensorFlow Lite—that enables unified deployment on both CPU and CPU-FPGA heterogeneous platforms (PYNQ-Z2 and Kria) through the co-design of a dedicated shift-based processing element (shift-PE). For the first time, it provides a comprehensive hardware-level evaluation of various power-of-two quantization strategies across full inference pipelines. Evaluated on CNN and Transformer models, PoTAcc achieves up to 3.6× speedup and 78% energy reduction while preserving model accuracy, thereby demonstrating clear advantages in resource utilization, performance, and energy efficiency.

📝 Abstract

Power-of-two (PoT) quantization significantly reduces the size of deep neural networks (DNNs) and replaces multiplications with bit-shift operations for inference. Prior work has shown that PoT-quantized DNNs can preserve accuracy for tasks such as image classification; however, their performance on resource-constrained edge devices remains insufficiently understood. While general-purpose edge CPUs and GPUs do not provide optimized backends for bit-shift operations, custom hardware accelerators can better exploit PoT quantization by implementing dedicated shift-based processing elements. However, deploying PoT-quantized models on such accelerators is challenging due to limited support in existing inference frameworks. In addition, the impact of different PoT quantization strategies on hardware design, performance, and energy efficiency during full inference has not been systematically explored. To address these challenges, we propose PoTAcc, an open-source end-to-end pipeline for accelerating and evaluating PoT-quantized DNNs on resource-constrained edge devices. PoTAcc enables seamless preparation and deployment of PoT-quantized models via TensorFlow Lite (TFLite) across heterogeneous platforms, including CPU-only systems and hybrid CPU-FPGA systems with custom accelerators. We design shift-based processing element (shift-PE) accelerators for three PoT quantization methods and implement them on two FPGA platforms. We evaluate accuracy, performance, energy efficiency, and resource utilization across a range of models, including CNNs and Transformer-based architectures. Results show that our CPU-accelerator design achieves up to 3.6x speedup and 78% energy reduction compared to CPU-only execution for PoT-quantized DNNs on PYNQ-Z2 and Kria boards. The code will be publicly released at https://github.com/gicLAB/PoTAcc

Problem

Research questions and friction points this paper is trying to address.

Power-of-Two quantization

DNN acceleration

edge devices

hardware deployment

inference efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Power-of-Two quantization

hardware accelerator

shift-based processing element