CIMR-V: An End-to-End SRAM-based CIM Accelerator with RISC-V for AI Edge Device

📅 2024-05-19
🏛️ International Symposium on Circuits and Systems
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high DRAM access latency and lack of end-to-end model inference capability in SRAM-based compute-in-memory (CIM) accelerators for AI edge devices, this work proposes the first SRAM-CIM accelerator architecture supporting full-model inference. The design innovatively integrates a CIM-RISC-V heterogeneous computing paradigm with a CIM-optimized instruction set, implemented in TSMC 28nm CMOS technology. It incorporates SRAM-based CIM macros, hardware-pipelined convolution and max-pooling units, weight reuse scheduling, and weight fusion mechanisms. Experimental results on keyword spotting demonstrate an 85.14% reduction in inference latency, achieving 3707.84 TOPS/W energy efficiency and a peak throughput of 26.21 TOPS at 50 MHz. This is the first demonstration of complete end-to-end model inference on an SRAM-CIM accelerator, simultaneously delivering high energy efficiency and strong programmability.

Technology Category

Application Category

📝 Abstract
Computing-in-memory (CIM) is renowned in deep learning due to its high energy efficiency resulting from highly parallel computing with minimal data movement. However, current SRAM-based CIM designs suffer from long latency for loading weight or feature maps from DRAM for large AI models. Moreover, previous SRAM-based CIM architectures lack end-to-end model inference. To address these issues, this paper proposes CIMR-V, an end-to-end CIM accelerator with RISC-V that incorporates CIM layer fusion, convolution/max pooling pipeline, and weight fusion, resulting in an 85.14% reduction in latency for the keyword spotting model. Furthermore, the proposed CIM-type instructions facilitate end-to-end AI model inference and full stack flow, effectively synergizing the high energy efficiency of CIM and the high programmability of RISC-V. Implemented using TSMC 28nm technology, the proposed design achieves an energy efficiency of 3707.84 TOPS/W and 26.21 TOPS at 50 MHz.
Problem

Research questions and friction points this paper is trying to address.

Reduces latency in SRAM-based CIM for large AI models
Enables end-to-end AI model inference in CIM architectures
Combines CIM energy efficiency with RISC-V programmability
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end SRAM-based CIM accelerator
CIM layer and weight fusion
RISC-V with CIM-type instructions
Y
Yan-Cheng Guo
Institute of Electronics, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
T
Tian-Sheuan Chang
Institute of Electronics, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
C
Chih-Sheng Lin
Industrial Technology Research Institute, Hsinchu, Taiwan
B
Bo-Cheng Chiou
Industrial Technology Research Institute, Hsinchu, Taiwan
C
Chih-Ming Lai
Industrial Technology Research Institute, Hsinchu, Taiwan
S
S. Sheu
Industrial Technology Research Institute, Hsinchu, Taiwan
W
Wei-Chung Lo
Industrial Technology Research Institute, Hsinchu, Taiwan
Shih-Chieh Chang
Shih-Chieh Chang
Computer Science, Natioanl Tsing-Hua University