IMAGINE: An 8-to-1b 22nm FD-SOI Compute-In-Memory CNN Accelerator With an End-to-End Analog Charge-Based 0.15-8POPS/W Macro Featuring Distribution-Aware Data Reshaping

📅 2024-12-27

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

To address the degradation in ADC effective bitwidth, energy efficiency, and inference accuracy caused by fixed voltage swing in Charge-domain In-Memory (CIM) SRAM for neural network inference, this work proposes a charge-domain CIM accelerator tailored for low-bit CNN inference at the edge. We introduce a novel data-reshaping mechanism synergizing linear on-chip Batch Normalization (ABN) with channel-wise dot-product array partitioning, and present the first analog-domain dynamic rescaling scheme enabling adaptive precision from 8-bit to 1-bit. Implemented in 22-nm FD-SOI, the design integrates a charge-domain SRAM macro, serial-input/parallel-weight architecture, and DAC-free accumulation, augmented with CIM-aware training and post-silicon equivalent noise modeling. The system achieves 40 TOPS/W energy efficiency (at 0.3/0.6 V), macro-level peak efficiency of 0.15–8 POPS/W, and area efficiency of 2.6–154 TOPS/mm²—representing a 3–5× improvement over prior charge-domain CIM designs—while maintaining competitive accuracy on MNIST and CIFAR-10.

Technology Category

Application Category

📝 Abstract

Charge-domain compute-in-memory (CIM) SRAMs have recently become an enticing compromise between computing efficiency and accuracy to process sub-8b convolutional neural networks (CNNs) at the edge. Yet, they commonly make use of a fixed dot-product (DP) voltage swing, which leads to a loss in effective ADC bits due to data-dependent clipping or truncation effects that waste precious conversion energy and computing accuracy. To overcome this, we present IMAGINE, a workload-adaptive 1-to-8b CIM-CNN accelerator in 22nm FD-SOI. It introduces a 1152x256 end-to-end charge-based macro with a multi-bit DP based on an input-serial, weight-parallel accumulation that avoids power-hungry DACs. An adaptive swing is achieved by combining a channel-wise DP array split with a linear in-ADC implementation of analog batch-normalization (ABN), obtaining a distribution-aware data reshaping. Critical design constraints are relaxed by including the post-silicon equivalent noise within a CIM-aware CNN training framework. Measurement results showcase an 8b system-level energy efficiency of 40TOPS/W at 0.3/0.6V, with competitive accuracies on MNIST and CIFAR-10. Moreover, the peak energy and area efficiencies of the 187kB/mm2 macro respectively reach up to 0.15-8POPS/W and 2.6-154TOPS/mm2, scaling with the 8-to-1b computing precision. These results exceed previous charge-based designs by 3-to-5x while being the first work to provide linear in-memory rescaling.

Problem

Research questions and friction points this paper is trying to address.

CIM SRAM

Voltage Range

Neural Network Processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

IMAGINE Accelerator

Auto Voltage Adjustment

Energy Efficiency Optimization

🔎 Similar Papers

TrIM, Triangular Input Movement Systolic Array for Convolutional Neural Networks: Architecture and Hardware Implementation