LogicSparse: Enabling Engine-Free Unstructured Sparsity for Quantised Deep-learning Accelerators

📅 2025-11-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address performance and resource bottlenecks in quantized neural network (QNN) inference on edge devices—caused by model complexity—this paper proposes a hardware-efficient, unstructured sparsity-aware acceleration framework for FPGAs, requiring no dedicated sparse execution units. The method features: (i) a hardware-aware fine-grained pruning strategy co-optimized with quantization and dataflow architecture; and (ii) a restructured sparse data layout and memory access pattern that eliminates irregular memory accesses while preserving high parallelism. Evaluated on LeNet-5, the framework achieves 51.6× model compression and 1.23× throughput improvement, consuming only 5.12% of LUT resources. It significantly enhances energy efficiency and hardware utilization. This work establishes a lightweight, general-purpose, and hardware-friendly paradigm for deploying QNNs under severe resource constraints.

Technology Category

Application Category

📝 Abstract
FPGAs have been shown to be a promising platform for deploying Quantised Neural Networks (QNNs) with high-speed, low-latency, and energy-efficient inference. However, the complexity of modern deep-learning models limits the performance on resource-constrained edge devices. While quantisation and pruning alleviate these challenges, unstructured sparsity remains underexploited due to irregular memory access. This work introduces a framework that embeds unstructured sparsity into dataflow accelerators, eliminating the need for dedicated sparse engines and preserving parallelism. A hardware-aware pruning strategy is introduced to improve efficiency and design flow further. On LeNet-5, the framework attains 51.6 x compression and 1.23 x throughput improvement using only 5.12% of LUTs, effectively exploiting unstructured sparsity for QNN acceleration.
Problem

Research questions and friction points this paper is trying to address.

Exploiting unstructured sparsity in quantized neural networks for efficient acceleration
Eliminating dedicated sparse engines while maintaining parallelism in dataflow accelerators
Addressing irregular memory access challenges in resource-constrained edge devices
Innovation

Methods, ideas, or system contributions that make the work stand out.

Embeds unstructured sparsity into dataflow accelerators
Eliminates need for dedicated sparse engines
Uses hardware-aware pruning strategy for efficiency
🔎 Similar Papers
No similar papers found.