🤖 AI Summary
This work addresses the challenges of low latency, low power consumption, and high energy efficiency in neural network acceleration for advanced driver-assistance systems (ADAS) by proposing a SIMD-enabled logarithmic bounded Posit arithmetic engine. The architecture efficiently supports mixed-precision Posit-(8,0)/(16,1)/(32,2) computation on unified hardware through bounded Posit representation, stage-adaptive logarithmic mantissa multiplication, bit truncation, and a shared quire accumulation path, achieving an optimal trade-off between accuracy and energy efficiency. FPGA implementation results demonstrate a 41.4% reduction in LUT usage, 76.1% lower latency, 71.9% less power consumption, and a tenfold improvement in energy efficiency compared to the baseline. A TinyYOLOv3 prototype implemented on this engine achieves 78 ms inference latency and 0.29 W power consumption with less than 1.5 percentage points of accuracy loss.
📝 Abstract
Advanced driver-assistance systems (ADAS) require neural compute engines that deliver low-latency inference under strict power and area constraints. Posit arithmetic is attractive for such accelerators because it provides high numerical fidelity at low precision, but its variable-length regime encoding increases encode/decode cost and exposes the datapath to large regime-field fault effects. This paper presents EULER-ADAS, a SIMD-enabled logarithmic bounded-Posit neural compute engine for energyefficient and reliability-aware ADAS acceleration. The proposed datapath combines bounded-regime Posit representation, stageadaptive logarithmic mantissa multiplication with bit truncation, and a SIMD-shared quire accumulation path supporting Posit- (8,0), Posit-(16,1), and Posit-(32,2) execution. The unified architecture enables 4xPosit-8, 2xPosit-16, or 1xPosit-32 operation without duplicating precision-specific hardware. FPGA implementation shows that the proposed configurations reduce LUT count by up to 41.4%, delay by up to 76.1%, and power by up to 71.9% relative to exact Posit neural compute engines, while achieving up to 10x lower energy-delay product than radix-4 Booth-based Posit multipliers. In 28-nm CMOS, the bounded variants occupy 0.013-0.016 mm2 , consume 19.8-22.1 mW, and operate at up to 1.84 GHz. Application-level evaluation across image-classification, ADAS, and edge-inference workloads shows that the evaluated Posit-16 and Posit-32 configurations remain within about 1.5 percentage points of FP32 accuracy. A TinyYOLOv3 prototype on Pynq-Z2 achieves 78 ms latency at 0.29 W and 22.6 mJ/frame, demonstrating the suitability of EULERADAS for low-power real-time ADAS inference.