LogicSparse: Enabling Engine-Free Unstructured Sparsity for Quantised Deep-learning Accelerators

📅 2025-11-05

📈 Citations: 0

✨ Influential: 0

career value

248K/year

🤖 AI Summary

To address performance and resource bottlenecks in quantized neural network (QNN) inference on edge devices—caused by model complexity—this paper proposes a hardware-efficient, unstructured sparsity-aware acceleration framework for FPGAs, requiring no dedicated sparse execution units. The method features: (i) a hardware-aware fine-grained pruning strategy co-optimized with quantization and dataflow architecture; and (ii) a restructured sparse data layout and memory access pattern that eliminates irregular memory accesses while preserving high parallelism. Evaluated on LeNet-5, the framework achieves 51.6× model compression and 1.23× throughput improvement, consuming only 5.12% of LUT resources. It significantly enhances energy efficiency and hardware utilization. This work establishes a lightweight, general-purpose, and hardware-friendly paradigm for deploying QNNs under severe resource constraints.

Technology Category

Application Category

📝 Abstract

FPGAs have been shown to be a promising platform for deploying Quantised Neural Networks (QNNs) with high-speed, low-latency, and energy-efficient inference. However, the complexity of modern deep-learning models limits the performance on resource-constrained edge devices. While quantisation and pruning alleviate these challenges, unstructured sparsity remains underexploited due to irregular memory access. This work introduces a framework that embeds unstructured sparsity into dataflow accelerators, eliminating the need for dedicated sparse engines and preserving parallelism. A hardware-aware pruning strategy is introduced to improve efficiency and design flow further. On LeNet-5, the framework attains 51.6 x compression and 1.23 x throughput improvement using only 5.12% of LUTs, effectively exploiting unstructured sparsity for QNN acceleration.

Problem

Research questions and friction points this paper is trying to address.

Exploiting unstructured sparsity in quantized neural networks for efficient acceleration

Eliminating dedicated sparse engines while maintaining parallelism in dataflow accelerators

Addressing irregular memory access challenges in resource-constrained edge devices

Innovation

Methods, ideas, or system contributions that make the work stand out.

Embeds unstructured sparsity into dataflow accelerators

Eliminates need for dedicated sparse engines

Uses hardware-aware pruning strategy for efficiency

🔎 Similar Papers

No similar papers found.