Computing-In-Memory Dataflow for Minimal Buffer Traffic

📅 2025-08-19

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

To address two critical bottlenecks in compute-in-memory (CIM) acceleration of depthwise separable convolutions—low memory utilization and excessive on-chip buffer traffic (causing high latency and energy consumption)—this paper proposes a novel dataflow architecture. The architecture systematically optimizes buffer traffic for the first time, jointly orchestrating weight residency and feature reuse to maximize on-chip data reuse. Its theoretical foundation is rigorous and broadly applicable to edge AI lightweight models such as MobileNet and EfficientNet. Experimental evaluation on mainstream CIM hardware platforms demonstrates substantial improvements: buffer traffic reduced by 77.4%–87.0%, total data transfer energy decreased by 10.1%–17.9%, and inference latency reduced by 15.6%–27.8%. These gains significantly enhance both energy efficiency and throughput.

Technology Category

Application Category

📝 Abstract

Computing-In-Memory (CIM) offers a potential solution to the memory wall issue and can achieve high energy efficiency by minimizing data movement, making it a promising architecture for edge AI devices. Lightweight models like MobileNet and EfficientNet, which utilize depthwise convolution for feature extraction, have been developed for these devices. However, CIM macros often face challenges in accelerating depthwise convolution, including underutilization of CIM memory and heavy buffer traffic. The latter, in particular, has been overlooked despite its significant impact on latency and energy consumption. To address this, we introduce a novel CIM dataflow that significantly reduces buffer traffic by maximizing data reuse and improving memory utilization during depthwise convolution. The proposed dataflow is grounded in solid theoretical principles, fully demonstrated in this paper. When applied to MobileNet and EfficientNet models, our dataflow reduces buffer traffic by 77.4-87.0%, leading to a total reduction in data traffic energy and latency by 10.1-17.9% and 15.6-27.8%, respectively, compared to the baseline (conventional weight-stationary dataflow).

Problem

Research questions and friction points this paper is trying to address.

Reducing buffer traffic in CIM depthwise convolution

Addressing memory underutilization in edge AI accelerators

Minimizing energy and latency in mobile network inference

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel CIM dataflow minimizes buffer traffic

Maximizes data reuse in depthwise convolution

Improves memory utilization for energy efficiency

🔎 Similar Papers

No similar papers found.