OISMA: On-the-fly In-memory Stochastic Multiplication Architecture for Matrix-Multiplication Workloads

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

To address the von Neumann bottleneck and energy-efficiency limitations imposed by large-scale matrix multiplication in AI models, this paper proposes OISMA—a compute-in-memory architecture. OISMA innovatively integrates a quasi-stochastic computing domain (employing Bent-Pyramid bitstream encoding) with the programmability and scalability of digital memory, enabling in-situ stochastic multiplication and bitstream accumulation using 1T1R RRAM devices. It achieves high accuracy without sacrificing efficiency or flexibility. Evaluated at 180 nm and 22 nm technology nodes, OISMA delivers a Frobenius error of only 1.81% for 512×512 matrix multiplication, with energy efficiency of 0.891 TOPS/W and area efficiency of 3.98 GOPS/mm²; scaling to 22 nm yields a two-order-of-magnitude performance improvement. The key contribution is the first integration of structured stochastic computing into in-memory multiplication—overcoming the accuracy limitations of analog compute-in-memory and the energy-efficiency bottlenecks of digital alternatives.

Technology Category

Application Category

📝 Abstract

Artificial Intelligence models are currently driven by a significant up-scaling of their complexity, with massive matrix multiplication workloads representing the major computational bottleneck. In-memory computing architectures are proposed to avoid the Von Neumann bottleneck. However, both digital/binary-based and analogue in-memory computing architectures suffer from various limitations, which significantly degrade the performance and energy efficiency gains. This work proposes OISMA, a novel in-memory computing architecture that utilizes the computational simplicity of a quasi-stochastic computing domain (Bent-Pyramid system), while keeping the same efficiency, scalability, and productivity of digital memories. OISMA converts normal memory read operations into in-situ stochastic multiplication operations with a negligible cost. An accumulation periphery then accumulates the output multiplication bitstreams, achieving the matrix multiplication functionality. Extensive matrix multiplication benchmarking was conducted to analyze the accuracy of the Bent-Pyramid system, using matrix dimensions ranging from 4x4 to 512x512. The accuracy results show a significant decrease in the average relative Frobenius error, from 9.42% (for 4x4) to 1.81% (for 512x512), compared to 64-bit double precision floating-point format. A 1T1R OISMA array of 4 KB capacity was implemented using a commercial 180nm technology node and in-house RRAM technology. At 50 MHz, OISMA achieves 0.891 TOPS/W and 3.98 GOPS/mm2 for energy and area efficiency, respectively, occupying an effective computing area of 0.804241 mm2. Scaling OISMA from 180nm to 22nm technology shows a significant improvement of two orders of magnitude in energy efficiency and one order of magnitude in area efficiency, compared to dense matrix multiplication in-memory computing architectures.

Problem

Research questions and friction points this paper is trying to address.

Addresses computational bottleneck in AI matrix multiplication workloads

Overcomes limitations of digital/analog in-memory computing architectures

Proposes energy-efficient stochastic multiplication for scalable matrix operations

Innovation

Methods, ideas, or system contributions that make the work stand out.

In-memory stochastic multiplication architecture

Quasi-stochastic computing domain utilization

Scalable energy-efficient matrix multiplication

🔎 Similar Papers

No similar papers found.