🤖 AI Summary
This work proposes an in-memory logic architecture based on magnetic tunnel junctions (MTJs) to address the latency and energy bottlenecks caused by data movement in von Neumann architectures. The design uniquely exploits the intrinsic stochasticity of MTJs to generate deterministic probabilistic bitstreams in parallel—eliminating the need for external random sources—and tightly integrates bitstream generation, memory storage, and stochastic computing units within the memory array. This enables efficient, low-complexity execution of both arithmetic and transcendental functions. By incorporating parallel accumulation and result reuse mechanisms, the architecture substantially reduces data-movement overhead while maintaining high energy efficiency and parallelism, and it inherently tolerates noise due to the stochastic nature of its computation.
📝 Abstract
Today's high-performance architectures are increasingly constrained by data movement latency and energy overhead, as the slowdown of single-core performance scaling coincides with the rise of highly data-intensive workloads. In-memory architectures have emerged as a complementary solution to conventional von Neumann systems by alleviating memory bandwidth bottlenecks, exploiting massive concurrency, and mitigating excessive data movement between memory and processing units. This study proposes a parallel in-memory stochastic computing (SC) architecture that implements an end-to-end computation pipeline within Magnetic Tunnel Junction (MTJ)-based memory augmented with logic-in-memory (LIM) capabilities. By leveraging the inherent stochasticity and write-read characteristics of MTJ devices, the proposed architecture enables a fully parallel and deterministic conversion of binary operands into probabilistic bit-streams, eliminating the need for energy-intensive external random number generation circuitry. These bit-streams are processed by parallel stochastic arithmetic units integrated directly within the memory arrays to efficiently implement core arithmetic and transcendental functions with minimal hardware complexity and inherent noise tolerance. The resulting stochastic outputs can be either reused as an input of future stochastic processing or converted back to binary form using parallel accumulation mechanisms and stored in the MTJ memory. By tightly integrating data storage, bit-stream generation, and computation within a unified in-memory fabric, the proposed design maximizes memory-level parallelism while substantially minimizing data movement.