🤖 AI Summary
This work proposes a novel approach to efficiently execute sparse neural networks on dense matrix-multiplication accelerators, eschewing conventional specialized sparse architectures. By optimizing the layout and scheduling of pruned sparse data, the method enables the deployment of a greater number of dense processing elements (PEs) within the same hardware area, thereby significantly improving hardware utilization. The elimination of complex index-matching circuitry—typically required in dedicated sparse accelerators—reduces both area and power overheads. Experimental results demonstrate that the proposed technique outperforms specialized sparse accelerators in both area efficiency and energy efficiency, achieving dual optimization of computational performance.
📝 Abstract
As the size of Deep Neural Networks (DNNs) increases dramatically to achieve high accuracy, the DNNs require a large amount of computations and memory footprint. Pruning, which produces a sparse neural network, is one of the solutions to reduce the computational complexity of neural network processing. To maximize the performance of the computations with such compressed data, dedicated sparse neural network accelerators have been introduced, but complex circuits for matching the indices of non-zero inputs/weights cause large overhead in area and power of processing elements (PEs). The sparse PE becomes significantly larger than the dense PE, which raises an interesting question for designers; "Given the area, isn't it better to use larger number of dense PEs despite the low utilization in sparse matrix computations?" In this paper, we show that the answer is "yes", and demonstrate an area and energy-efficient method for sparse neural network computing on dense-matrix multiplication hardware accelerators (Sparse-on-Dense).