DS-CIM: Digital Stochastic Computing-In-Memory Featuring Accurate OR-Accumulation via Sample Region Remapping for Edge AI Models

πŸ“… 2026-01-10
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the trade-off between hardware simplicity and computational throughput in edge AI by proposing DS-CIM, a novel digital stochastic computing-in-memory (CIM) architecture. DS-CIM innovatively employs OR-gate-based unsigned multiply-accumulate operations, shared pseudo-random number generators (PRNGs), and two-dimensional activation partitioning to eliminate OR conflicts, while introducing sample-region remapping to mitigate the β€œ1-saturation” problem. These techniques collectively enhance model accuracy and robustness to sparsity. Evaluated on CIFAR-10, DS-CIM achieves 94.45% accuracy (RMSE = 0.74%), with an energy efficiency of 3566.1 TOPS/W and area efficiency of 363.7 TOPS/mmΒ² (RMSE = 3.81%). The architecture further demonstrates scalability on larger models, including ResNet-50 and LLaMA-7B.

Technology Category

Application Category

πŸ“ Abstract
Stochastic computing (SC) offers hardware simplicity but suffers from low throughput, while high-throughput Digital Computing-in-Memory (DCIM) is bottlenecked by costly adder logic for matrix-vector multiplication (MVM). To address this trade-off, this paper introduces a digital stochastic CIM (DS-CIM) architecture that achieves both high accuracy and efficiency. We implement signed multiply-accumulation (MAC) in a compact, unsigned OR-based circuit by modifying the data representation. Throughput is enhanced by replicating this low-cost circuit 64 times with only a 1x area increase. Our core strategy, a shared Pseudo Random Number Generator (PRNG) with 2D partitioning, enables single-cycle mutually exclusive activation to eliminate OR-gate collisions. We also resolve the 1s saturation issue via stochastic process analysis and data remapping, significantly improving accuracy and resilience to input sparsity. Our high-accuracy DS-CIM1 variant achieves 94.45% accuracy for INT8 ResNet18 on CIFAR-10 with a root-mean-squared error (RMSE) of just 0.74%. Meanwhile, our high-efficiency DS-CIM2 variant attains an energy efficiency of 3566.1 TOPS/W and an area efficiency of 363.7 TOPS/mm^2, while maintaining a low RMSE of 3.81%. The DS-CIM capability with larger models is further demonstrated through experiments with INT8 ResNet50 on ImageNet and the FP8 LLaMA-7B model.
Problem

Research questions and friction points this paper is trying to address.

Stochastic Computing
Computing-in-Memory
Matrix-Vector Multiplication
Edge AI
Hardware Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic Computing-in-Memory
OR-Accumulation
Sample Region Remapping
Shared PRNG with 2D Partitioning
Signed MAC via Unsigned OR
πŸ”Ž Similar Papers
No similar papers found.
K
Kunming Shao
The Hong Kong University of Science and Technology
L
Liang Zhao
South China University of Technology
Jiangnan Yu
Jiangnan Yu
The Hong Kong University of Science and Technology
LLM accelerationAI acceleratorNetwork on ChipMoEEmerging Non-Volatile-Memory
Zhipeng Liao
Zhipeng Liao
Professor, Department of Economics, UCLA
economicseconometrics
X
Xiaomeng Wang
The Hong Kong University of Science and Technology
Yi Zou
Yi Zou
Intel Labs
Near-data and in-memory computingComputer Architecture and Computer SystemsNon-volatile storagedistributed storagebig da
T
Tim Kwang-Ting Cheng
The Hong Kong University of Science and Technology
C
Chi-Ying Tsui
The Hong Kong University of Science and Technology