SPADE: A SIMD Posit-enabled compute engine for Accelerating DNN Efficiency

📅 2026-01-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of balancing accuracy, energy efficiency, and hardware overhead in edge AI systems, where conventional numeric formats struggle to support efficient multi-precision computation. The paper presents the first unified SIMD multiply-accumulate (MAC) architecture capable of natively supporting Posit(8,0), Posit(16,1), and Posit(32,2) formats. By leveraging a regime-aware and channel-fused datapath, the design reuses specialized submodules—including leading-one detectors, two’s complement units, shifters, and multipliers—thereby eliminating redundant circuitry. Implemented on both FPGA (Xilinx Virtex-7) and ASIC (TSMC 28 nm), the architecture achieves significant hardware savings: up to 80% fewer slices and 45.13% less LUT usage on FPGA compared to prior work. The ASIC implementation operates at 1.38 GHz with only 6.1 mW power consumption while maintaining high inference accuracy on benchmarks such as MNIST and CIFAR.

Technology Category

Application Category

📝 Abstract
The growing demand for edge-AI systems requires arithmetic units that balance numerical precision, energy efficiency, and compact hardware while supporting diverse formats. Posit arithmetic offers advantages over floating- and fixed-point representations through its tapered precision, wide dynamic range, and improved numerical robustness. This work presents SPADE, a unified multi-precision SIMD Posit-based multiplyaccumulate (MAC) architecture supporting Posit (8,0), Posit (16,1), and Posit (32,2) within a single framework. Unlike prior single-precision or floating/fixed-point SIMD MACs, SPADE introduces a regime-aware, lane-fused SIMD Posit datapath that hierarchically reuses Posit-specific submodules (LOD, complementor, shifter, and multiplier) across 8/16/32-bit precisions without datapath replication. FPGA implementation on a Xilinx Virtex-7 shows 45.13% LUT and 80% slice reduction for Posit (8,0), and up to 28.44% and 17.47% improvement for Posit (16,1) and Posit (32,2) over prior work, with only 6.9% LUT and 14.9% register overhead for multi-precision support. ASIC results across TSMC nodes achieve 1.38 GHz at 6.1 mW (28 nm). Evaluation on MNIST, CIFAR-10/100, and alphabet datasets confirms competitive inference accuracy.
Problem

Research questions and friction points this paper is trying to address.

edge-AI
arithmetic units
numerical precision
energy efficiency
hardware compactness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Posit arithmetic
SIMD architecture
multi-precision MAC
hardware reuse
edge AI
🔎 Similar Papers
No similar papers found.
S
Sonu Kumar
Centre for Advanced Electronics, IIT Indore
L
Lavanya Vinnakota
NSDCS Research Group, IIT Indore
M
Mukul Lokhande
NSDCS Research Group, IIT Indore
S
S. Vishvakarma
Centre for Advanced Electronics, IIT Indore
Adam Teman
Adam Teman
Bar Ilan University
Embedded MemoriesEnergy Efficient Circuit DesignDomain-Specific ArchitecturesRISC-VPhysical Design