Computing-In-Memory Dataflow for Minimal Buffer Traffic

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address two critical bottlenecks in compute-in-memory (CIM) acceleration of depthwise separable convolutions—low memory utilization and excessive on-chip buffer traffic (causing high latency and energy consumption)—this paper proposes a novel dataflow architecture. The architecture systematically optimizes buffer traffic for the first time, jointly orchestrating weight residency and feature reuse to maximize on-chip data reuse. Its theoretical foundation is rigorous and broadly applicable to edge AI lightweight models such as MobileNet and EfficientNet. Experimental evaluation on mainstream CIM hardware platforms demonstrates substantial improvements: buffer traffic reduced by 77.4%–87.0%, total data transfer energy decreased by 10.1%–17.9%, and inference latency reduced by 15.6%–27.8%. These gains significantly enhance both energy efficiency and throughput.

Technology Category

Application Category

📝 Abstract
Computing-In-Memory (CIM) offers a potential solution to the memory wall issue and can achieve high energy efficiency by minimizing data movement, making it a promising architecture for edge AI devices. Lightweight models like MobileNet and EfficientNet, which utilize depthwise convolution for feature extraction, have been developed for these devices. However, CIM macros often face challenges in accelerating depthwise convolution, including underutilization of CIM memory and heavy buffer traffic. The latter, in particular, has been overlooked despite its significant impact on latency and energy consumption. To address this, we introduce a novel CIM dataflow that significantly reduces buffer traffic by maximizing data reuse and improving memory utilization during depthwise convolution. The proposed dataflow is grounded in solid theoretical principles, fully demonstrated in this paper. When applied to MobileNet and EfficientNet models, our dataflow reduces buffer traffic by 77.4-87.0%, leading to a total reduction in data traffic energy and latency by 10.1-17.9% and 15.6-27.8%, respectively, compared to the baseline (conventional weight-stationary dataflow).
Problem

Research questions and friction points this paper is trying to address.

Reducing buffer traffic in CIM depthwise convolution
Addressing memory underutilization in edge AI accelerators
Minimizing energy and latency in mobile network inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel CIM dataflow minimizes buffer traffic
Maximizes data reuse in depthwise convolution
Improves memory utilization for energy efficiency
🔎 Similar Papers
No similar papers found.
C
Choongseok Song
Division of Materials Science and Engineering, Hanyang University, Seoul, Republic of Korea
Doo Seok Jeong
Doo Seok Jeong
Hanyang University
Neuromorphic hardware designSpiking neural network theoryLearning algorithmDeep learning accelerationNonvolatile memory