🤖 AI Summary
To address two critical bottlenecks in compute-in-memory (CIM) acceleration of depthwise separable convolutions—low memory utilization and excessive on-chip buffer traffic (causing high latency and energy consumption)—this paper proposes a novel dataflow architecture. The architecture systematically optimizes buffer traffic for the first time, jointly orchestrating weight residency and feature reuse to maximize on-chip data reuse. Its theoretical foundation is rigorous and broadly applicable to edge AI lightweight models such as MobileNet and EfficientNet. Experimental evaluation on mainstream CIM hardware platforms demonstrates substantial improvements: buffer traffic reduced by 77.4%–87.0%, total data transfer energy decreased by 10.1%–17.9%, and inference latency reduced by 15.6%–27.8%. These gains significantly enhance both energy efficiency and throughput.
📝 Abstract
Computing-In-Memory (CIM) offers a potential solution to the memory wall issue and can achieve high energy efficiency by minimizing data movement, making it a promising architecture for edge AI devices. Lightweight models like MobileNet and EfficientNet, which utilize depthwise convolution for feature extraction, have been developed for these devices. However, CIM macros often face challenges in accelerating depthwise convolution, including underutilization of CIM memory and heavy buffer traffic. The latter, in particular, has been overlooked despite its significant impact on latency and energy consumption. To address this, we introduce a novel CIM dataflow that significantly reduces buffer traffic by maximizing data reuse and improving memory utilization during depthwise convolution. The proposed dataflow is grounded in solid theoretical principles, fully demonstrated in this paper. When applied to MobileNet and EfficientNet models, our dataflow reduces buffer traffic by 77.4-87.0%, leading to a total reduction in data traffic energy and latency by 10.1-17.9% and 15.6-27.8%, respectively, compared to the baseline (conventional weight-stationary dataflow).