🤖 AI Summary
This work addresses the limitations of SRAM-based compute-in-memory (CIM) accelerators, which suffer from limited on-chip capacity and high off-chip data movement costs when processing large-scale deep neural networks, compounded by the absence of a systematic methodology for dataflow design. To overcome these challenges, the paper introduces AccelCIM, a framework that establishes the first comprehensive dataflow design space encompassing both CIM macro architecture and macro-array organization. It integrates cycle-accurate simulation with post-layout power-performance-area (PPA) co-evaluation to enable rigorous assessment. Through extensive design space exploration, AccelCIM demonstrates its efficacy on representative large language model workloads, offering both theoretical foundations and practical guidance for designing efficient and scalable SRAM CIM accelerators.
📝 Abstract
SRAM-based compute-in-memory (CIM) offers high computational density and energy efficiency for deep neural network (DNN) accelerators, but its limited capacity causes on/off-chip data movement overhead for large DNN models. Existing CIM accelerator studies typically assume that DNN models fit entirely on-chip, leaving efficient dataflow design largely untapped. This paper introduces AccelCIM, a systematic dataflow exploration framework for SRAM CIM accelerator, which addresses two key limitations of prior work. (1) It formulates a systematic dataflow design space spanning CIM macro configurations and macro-array organizations. (2) It introduces rigorous design evaluation using cycle-accurate architectural simulation and post-layout PPA analysis. We conduct an extensive design space exploration and apply AccelCIM to representative LLM applications, providing practical insights for the principled design of CIM accelerators.