Coding for Computation: Efficient Compression of Neural Networks for Reconfigurable Hardware

📅 2025-04-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Neural network inference on reconfigurable hardware (e.g., FPGAs) suffers from high computational overhead—particularly in addition operations—while existing compression methods prioritize memory footprint reduction over computational efficiency. Method: This paper proposes a hardware-aware compression framework explicitly minimizing addition operations. It shifts the compression paradigm from weight storage optimization to critical-path computation optimization, introduces Linear Computation Coding (LCC)—a unified mechanism jointly regularizing pruning and quantization across training and inference—and designs an FPGA-aware inference scheduler. Contribution/Results: Evaluated on MLP and ResNet-34, the method achieves high accuracy while significantly reducing addition count, improving inference throughput and FPGA resource utilization. It establishes a new paradigm for hardware-efficient AI inference by co-optimizing algorithmic structure and hardware constraints.

Technology Category

Application Category

📝 Abstract
As state of the art neural networks (NNs) continue to grow in size, their resource-efficient implementation becomes ever more important. In this paper, we introduce a compression scheme that reduces the number of computations required for NN inference on reconfigurable hardware such as FPGAs. This is achieved by combining pruning via regularized training, weight sharing and linear computation coding (LCC). Contrary to common NN compression techniques, where the objective is to reduce the memory used for storing the weights of the NNs, our approach is optimized to reduce the number of additions required for inference in a hardware-friendly manner. The proposed scheme achieves competitive performance for simple multilayer perceptrons, as well as for large scale deep NNs such as ResNet-34.
Problem

Research questions and friction points this paper is trying to address.

Efficient compression of large neural networks
Reduce computations for NN inference on FPGAs
Hardware-friendly approach minimizing addition operations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines pruning, weight sharing, LCC
Reduces additions for hardware efficiency
Optimized for reconfigurable hardware like FPGAs
🔎 Similar Papers
No similar papers found.
H
Hans Rosenberger
Institute for Digital Communications, Friedrich-Alexander-Universität (FAU), Erlangen, Germany
R
Rodrigo Fischer
Communications Engineering Lab, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany
J
Johanna S. Frohlich
Institute for Digital Communications, Friedrich-Alexander-Universität (FAU), Erlangen, Germany
Ali Bereyhi
Ali Bereyhi
University of Toronto
Statistical LearningInformation TheorySignal ProcessingWireless CommunicationsStatistical Mechanics
R
Ralf R. Muller
Institute for Digital Communications, Friedrich-Alexander-Universität (FAU), Erlangen, Germany