CIMinus: Empowering Sparse DNN Workloads Modeling and Exploration on SRAM-based CIM Architectures

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
SRAM-based compute-in-memory (CIM) architectures face three key challenges in accelerating sparse deep neural networks (DNNs): rigid array constraints, heterogeneous sparsity patterns, and the lack of a unified modeling framework for multi-macro CIM systems. To address these, this paper proposes CIMinus—the first unified framework enabling fine-grained sparsity modeling and analysis for multi-macro CIM systems. Its core innovation lies in a system-level co-modeling methodology for sparse DNNs and CIM hardware, integrating component-level energy breakdown with end-to-end latency estimation, and supporting joint quantification of energy efficiency and latency across diverse dataflows and mapping strategies. Experimental evaluation demonstrates that CIMinus accurately uncovers the impact mechanisms of sparsity patterns and mapping strategies on real-world performance, achieving high-accuracy prediction of both acceleration gains and bottlenecks in two representative scenarios—thereby bridging the gap between theoretical design and hardware deployment.

Technology Category

Application Category

📝 Abstract
Compute-in-memory (CIM) has emerged as a pivotal direction for accelerating workloads in the field of machine learning, such as Deep Neural Networks (DNNs). However, the effective exploitation of sparsity in CIM systems presents numerous challenges, due to the inherent limitations in their rigid array structures. Designing sparse DNN dataflows and developing efficient mapping strategies also become more complex when accounting for diverse sparsity patterns and the flexibility of a multi-macro CIM structure. Despite these complexities, there is still an absence of a unified systematic view and modeling approach for diverse sparse DNN workloads in CIM systems. In this paper, we propose CIMinus, a framework dedicated to cost modeling for sparse DNN workloads on CIM architectures. It provides an in-depth energy consumption analysis at the level of individual components and an assessment of the overall workload latency. We validate CIMinus against contemporary CIM architectures and demonstrate its applicability in two use-cases. These cases provide valuable insights into both the impact of sparsity patterns and the effectiveness of mapping strategies, bridging the gap between theoretical design and practical implementation.
Problem

Research questions and friction points this paper is trying to address.

Modeling sparse DNN workloads on SRAM-based CIM architectures
Addressing challenges of exploiting sparsity in rigid CIM array structures
Providing energy and latency analysis for sparse DNN dataflows
Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework for sparse DNN workload cost modeling
Analyzes energy consumption at component level
Assesses overall workload latency on CIM architectures
🔎 Similar Papers
No similar papers found.
Y
Yingjie Qi
School of Computer Science and Engineering, Beihang University, Beijing, 100191, China
Jianlei Yang
Jianlei Yang
Beihang University
Deep LearningComputer ArchitectureNueromorphic ComputingSpitronicsEDA/VLSI
Rubing Yang
Rubing Yang
University of Pennsylvania
Deep learningMachine perception
C
Cenlin Duan
School of Integrated Circuit Science and Engineering, Beihang University, Beijing, 100191, China
X
Xiaolin He
School of Computer Science and Engineering, Beihang University, Beijing, 100191, China
Z
Ziyan He
School of Telecommunications Engineering, Xidian University, Xi’an, 710071, China
W
Weitao Pan
School of Telecommunications Engineering, Xidian University, Xi’an, 710071, China
Weisheng Zhao
Weisheng Zhao
Fert Beijing Institute, Beihang University
Spintronics Devices and Integrated Circuits