Lightweight Software Kernels and Hardware Extensions for Efficient Sparse Deep Neural Networks on Microcontrollers

📅 2025-03-08

📈 Citations: 0

✨ Influential: 0

career value

271K/year

🤖 AI Summary

To address inefficient N:M sparse DNN inference on resource-constrained microcontrollers (MCUs), this paper proposes a hardware-software co-design acceleration framework. It introduces a lightweight RISC-V instruction set extension supporting indirect loading and index decompression, develops high-efficiency sparse GEMM and convolution kernels, and integrates end-to-end sparse compilation into TVM. The key contribution is the first complete N:M sparse computing stack tailored for RISC-V MCUs, balancing hardware extensibility with software deployability. Experimental results show that the sparse kernels achieve 2.1×–3.4× speedup over dense baselines; the custom instructions further improve performance by 1.9×. End-to-end inference acceleration reaches 3.21× for ResNet-18 and 1.81× for ViT, with negligible accuracy degradation (<1.5% top-1 error).

Technology Category

Application Category

📝 Abstract

The acceleration of pruned Deep Neural Networks (DNNs) on edge devices such as Microcontrollers (MCUs) is a challenging task, given the tight area- and power-constraints of these devices. In this work, we propose a three-fold contribution to address this problem. First, we design a set of optimized software kernels for N:M pruned layers, targeting ultra-low-power, multicore RISC-V MCUs, which are up to 2.1x and 3.4x faster than their dense counterparts at 1:8 and 1:16 sparsity, respectively. Then, we implement a lightweight Instruction-Set Architecture (ISA) extension to accelerate the indirect load and non-zero indices decompression operations required by our kernels, obtaining up to 1.9x extra speedup, at the cost of a 5% area overhead. Lastly, we extend an open-source DNN compiler to utilize our sparse kernels for complete networks, showing speedups of 3.21x and 1.81x on a ResNet18 and a Vision Transformer (ViT), with less than 1.5% accuracy drop compared to a dense baseline.

Problem

Research questions and friction points this paper is trying to address.

Accelerating pruned DNNs on resource-constrained MCUs

Optimizing software kernels for N:M pruned layers

Extending ISA to enhance sparse DNN performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimized software kernels for N:M pruned layers

Lightweight ISA extension for indirect load operations

Extended DNN compiler for sparse kernel utilization

🔎 Similar Papers

No similar papers found.