Hardware/Software Co-Design of RISC-V Extensions for Accelerating Sparse DNNs on FPGAs

📅 2025-04-28

📈 Citations: 0

✨ Influential: 0

career value

262K/year

🤖 AI Summary

In TinyML-oriented FPGA-based DNN inference, coexisting semi-structured and unstructured sparsity leads to low hardware utilization and fragmented acceleration. To address this, we propose a software-hardware co-design approach featuring a RISC-V instruction set extension. Our contributions are threefold: (1) a novel bit-level sparse encoding scheme enabling fine-grained skipping of semi-structured sparse computations; (2) a variable-cycle serial MAC unit dynamically adapting to the number of non-zero weights to improve unstructured sparse computation efficiency; and (3) the first unified hardware architecture simultaneously accelerating both sparsity types. Integrated with a sparsity-aware compiler and bit-level configurable FPGA hardware, our design achieves 4×, 3×, and 5× inference speedups on keyword spotting, image classification, and human detection tasks, respectively—while incurring significantly lower resource overhead than state-of-the-art approaches—enabling efficient deployment on small-scale FPGAs.

Technology Category

Application Category

📝 Abstract

The customizability of RISC-V makes it an attractive choice for accelerating deep neural networks (DNNs). It can be achieved through instruction set extensions and corresponding custom functional units. Yet, efficiently exploiting these opportunities requires a hardware/software co-design approach in which the DNN model, software, and hardware are designed together. In this paper, we propose novel RISC-V extensions for accelerating DNN models containing semi-structured and unstructured sparsity. While the idea of accelerating structured and unstructured pruning is not new, our novel design offers various advantages over other designs. To exploit semi-structured sparsity, we take advantage of the fine-grained (bit-level) configurability of FPGAs and suggest reserving a few bits in a block of DNN weights to encode the information about sparsity in the succeeding blocks. The proposed custom functional unit utilizes this information to skip computations. To exploit unstructured sparsity, we propose a variable cycle sequential multiply-and-accumulate unit that performs only as many multiplications as the non-zero weights. Our implementation of unstructured and semi-structured pruning accelerators can provide speedups of up to a factor of 3 and 4, respectively. We then propose a combined design that can accelerate both types of sparsities, providing speedups of up to a factor of 5. Our designs consume a small amount of additional FPGA resources such that the resulting co-designs enable the acceleration of DNNs even on small FPGAs. We benchmark our designs on standard TinyML applications such as keyword spotting, image classification, and person detection.

Problem

Research questions and friction points this paper is trying to address.

Accelerating sparse DNNs using RISC-V extensions on FPGAs

Hardware/software co-design for semi-structured and unstructured sparsity

Optimizing FPGA resources to enable DNN acceleration on small devices

Innovation

Methods, ideas, or system contributions that make the work stand out.

RISC-V extensions for sparse DNN acceleration

FPGA-based bit-level sparsity encoding

Variable cycle sequential MAC unit

🔎 Similar Papers

An Efficient Real-Time Object Detection Framework on Resource-Constricted Hardware Devices via Software and Hardware Co-design