🤖 AI Summary
This work addresses the challenge of balancing accuracy, energy efficiency, and hardware overhead in edge AI systems, where conventional numeric formats struggle to support efficient multi-precision computation. The paper presents the first unified SIMD multiply-accumulate (MAC) architecture capable of natively supporting Posit(8,0), Posit(16,1), and Posit(32,2) formats. By leveraging a regime-aware and channel-fused datapath, the design reuses specialized submodules—including leading-one detectors, two’s complement units, shifters, and multipliers—thereby eliminating redundant circuitry. Implemented on both FPGA (Xilinx Virtex-7) and ASIC (TSMC 28 nm), the architecture achieves significant hardware savings: up to 80% fewer slices and 45.13% less LUT usage on FPGA compared to prior work. The ASIC implementation operates at 1.38 GHz with only 6.1 mW power consumption while maintaining high inference accuracy on benchmarks such as MNIST and CIFAR.
📝 Abstract
The growing demand for edge-AI systems requires arithmetic units that balance numerical precision, energy efficiency, and compact hardware while supporting diverse formats. Posit arithmetic offers advantages over floating- and fixed-point representations through its tapered precision, wide dynamic range, and improved numerical robustness. This work presents SPADE, a unified multi-precision SIMD Posit-based multiplyaccumulate (MAC) architecture supporting Posit (8,0), Posit (16,1), and Posit (32,2) within a single framework. Unlike prior single-precision or floating/fixed-point SIMD MACs, SPADE introduces a regime-aware, lane-fused SIMD Posit datapath that hierarchically reuses Posit-specific submodules (LOD, complementor, shifter, and multiplier) across 8/16/32-bit precisions without datapath replication. FPGA implementation on a Xilinx Virtex-7 shows 45.13% LUT and 80% slice reduction for Posit (8,0), and up to 28.44% and 17.47% improvement for Posit (16,1) and Posit (32,2) over prior work, with only 6.9% LUT and 14.9% register overhead for multi-precision support. ASIC results across TSMC nodes achieve 1.38 GHz at 6.1 mW (28 nm). Evaluation on MNIST, CIFAR-10/100, and alphabet datasets confirms competitive inference accuracy.