🤖 AI Summary
To address the degradation in ADC effective bitwidth, energy efficiency, and inference accuracy caused by fixed voltage swing in Charge-domain In-Memory (CIM) SRAM for neural network inference, this work proposes a charge-domain CIM accelerator tailored for low-bit CNN inference at the edge. We introduce a novel data-reshaping mechanism synergizing linear on-chip Batch Normalization (ABN) with channel-wise dot-product array partitioning, and present the first analog-domain dynamic rescaling scheme enabling adaptive precision from 8-bit to 1-bit. Implemented in 22-nm FD-SOI, the design integrates a charge-domain SRAM macro, serial-input/parallel-weight architecture, and DAC-free accumulation, augmented with CIM-aware training and post-silicon equivalent noise modeling. The system achieves 40 TOPS/W energy efficiency (at 0.3/0.6 V), macro-level peak efficiency of 0.15–8 POPS/W, and area efficiency of 2.6–154 TOPS/mm²—representing a 3–5× improvement over prior charge-domain CIM designs—while maintaining competitive accuracy on MNIST and CIFAR-10.
📝 Abstract
Charge-domain compute-in-memory (CIM) SRAMs have recently become an enticing compromise between computing efficiency and accuracy to process sub-8b convolutional neural networks (CNNs) at the edge. Yet, they commonly make use of a fixed dot-product (DP) voltage swing, which leads to a loss in effective ADC bits due to data-dependent clipping or truncation effects that waste precious conversion energy and computing accuracy. To overcome this, we present IMAGINE, a workload-adaptive 1-to-8b CIM-CNN accelerator in 22nm FD-SOI. It introduces a 1152x256 end-to-end charge-based macro with a multi-bit DP based on an input-serial, weight-parallel accumulation that avoids power-hungry DACs. An adaptive swing is achieved by combining a channel-wise DP array split with a linear in-ADC implementation of analog batch-normalization (ABN), obtaining a distribution-aware data reshaping. Critical design constraints are relaxed by including the post-silicon equivalent noise within a CIM-aware CNN training framework. Measurement results showcase an 8b system-level energy efficiency of 40TOPS/W at 0.3/0.6V, with competitive accuracies on MNIST and CIFAR-10. Moreover, the peak energy and area efficiencies of the 187kB/mm2 macro respectively reach up to 0.15-8POPS/W and 2.6-154TOPS/mm2, scaling with the 8-to-1b computing precision. These results exceed previous charge-based designs by 3-to-5x while being the first work to provide linear in-memory rescaling.