🤖 AI Summary
To address the energy inefficiency of von Neumann architectures in edge computing and the high integration complexity and poor software support of existing compute-in-memory (CIM) solutions, this paper proposes a software-friendly near-memory computing (NMC) architecture with low integration overhead. We introduce two novel, configurable NMC microarchitectures—NM-Caesar and NM-Carus—that jointly optimize area, performance, and flexibility. To our knowledge, this is the first NMC design supporting native RISC-V programming via custom instruction extensions and synergistic in-memory/near-memory execution. It integrates an 8-bit quantized matrix multiplication engine. Evaluated against an RV32IMC CPU, our system achieves up to 53.9× reduction in execution time and up to 35.6× improvement in energy efficiency. NM-Carus attains a peak energy efficiency of 306.7 GOPS/W—the highest reported for comparable circuits.
📝 Abstract
The widespread adoption of data-centric algorithms, particularly Artificial Intelligence (AI) and Machine Learning (ML), has exposed the limitations of centralized processing infrastructures, driving a shift towards edge computing. This necessitates stringent constraints on energy efficiency, which traditional von Neumann architectures struggle to meet. The Compute-In-Memory (CIM) paradigm has emerged as a superior candidate due to its efficient exploitation of available memory bandwidth. However, existing CIM solutions require high implementation effort and lack flexibility from a software integration standpoint. This work proposes a novel, software-friendly, general-purpose, and low-integration-effort Near-Memory Computing (NMC) approach, paving the way for the adoption of CIM-based systems in the next generation of edge computing nodes. Two architectural variants, NM-Caesar and NM-Carus, are proposed and characterized to target different trade-offs in area efficiency, performance, and flexibility, covering a wide range of embedded microcontrollers. Post-layout simulations show up to $28.0 imes$ and $53.9 imes$ lower execution time and $25.0 imes$ and $35.6 imes$ higher energy efficiency at the system level, respectively, compared to executing the same tasks on a state-of-the-art RISC-V CPU (RV32IMC). NM-Carus achieves a peak energy efficiency of $306.7$ GOPS/W in 8-bit matrix multiplications, surpassing recent state-of-the-art in- and near-memory circuits.