DS-CIM: Digital Stochastic Computing-In-Memory Featuring Accurate OR-Accumulation via Sample Region Remapping for Edge AI Models

📅 2026-01-10

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

256K/year

🤖 AI Summary

This work addresses the trade-off between hardware simplicity and computational throughput in edge AI by proposing DS-CIM, a novel digital stochastic computing-in-memory (CIM) architecture. DS-CIM innovatively employs OR-gate-based unsigned multiply-accumulate operations, shared pseudo-random number generators (PRNGs), and two-dimensional activation partitioning to eliminate OR conflicts, while introducing sample-region remapping to mitigate the “1-saturation” problem. These techniques collectively enhance model accuracy and robustness to sparsity. Evaluated on CIFAR-10, DS-CIM achieves 94.45% accuracy (RMSE = 0.74%), with an energy efficiency of 3566.1 TOPS/W and area efficiency of 363.7 TOPS/mm² (RMSE = 3.81%). The architecture further demonstrates scalability on larger models, including ResNet-50 and LLaMA-7B.

Technology Category

Application Category

📝 Abstract

Stochastic computing (SC) offers hardware simplicity but suffers from low throughput, while high-throughput Digital Computing-in-Memory (DCIM) is bottlenecked by costly adder logic for matrix-vector multiplication (MVM). To address this trade-off, this paper introduces a digital stochastic CIM (DS-CIM) architecture that achieves both high accuracy and efficiency. We implement signed multiply-accumulation (MAC) in a compact, unsigned OR-based circuit by modifying the data representation. Throughput is enhanced by replicating this low-cost circuit 64 times with only a 1x area increase. Our core strategy, a shared Pseudo Random Number Generator (PRNG) with 2D partitioning, enables single-cycle mutually exclusive activation to eliminate OR-gate collisions. We also resolve the 1s saturation issue via stochastic process analysis and data remapping, significantly improving accuracy and resilience to input sparsity. Our high-accuracy DS-CIM1 variant achieves 94.45% accuracy for INT8 ResNet18 on CIFAR-10 with a root-mean-squared error (RMSE) of just 0.74%. Meanwhile, our high-efficiency DS-CIM2 variant attains an energy efficiency of 3566.1 TOPS/W and an area efficiency of 363.7 TOPS/mm^2, while maintaining a low RMSE of 3.81%. The DS-CIM capability with larger models is further demonstrated through experiments with INT8 ResNet50 on ImageNet and the FP8 LLaMA-7B model.

Problem

Research questions and friction points this paper is trying to address.

Stochastic Computing

Computing-in-Memory

Matrix-Vector Multiplication

Edge AI

Hardware Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic Computing-in-Memory

OR-Accumulation

Sample Region Remapping