๐ค AI Summary
This work addresses the limited versatility of analog processing-in-memory (PUM) architectures, which struggle to support non-matrix-vector multiplication operations and face significant challenges in integration with digital logic. To overcome these limitations, the authors propose a hybrid analogโdigital PUM architecture that synergistically co-designs hardware and programming interfaces, enabling the first efficient fusion of analog and digital computing paradigms within a unified framework. The architecture supports flexible data widths and diverse in-memory computations through optimized peripheral circuits, a coordinated control unit, a mixed-signal management mechanism, and a scalable array structure. Evaluated on AES encryption, CNN inference, and large language model tasks, the proposed system achieves speedups of 59.4ร, 14.8ร, and 40.8ร, respectively, substantially outperforming a baseline combining analog PUM with a CPU.
๐ Abstract
Analog processing-using-memory (PUM; a.k.a. in-memory computing) makes use of electrical interactions inside memory arrays to perform bulk matrix-vector multiplication (MVM) operations. However, many popular matrix-based kernels need to execute non-MVM operations, which analog PUM cannot directly perform. To retain its energy efficiency, analog PUM architectures augment memory arrays with CMOS-based domain-specific fixed-function hardware to provide complete kernel functionality, but the difficulty of integrating such specialized CMOS logic with memory arrays has largely limited analog PUM to being an accelerator for machine learning inference, or for closely related kernels. An opportunity exists to harness analog PUM for general-purpose computation: recent works have shown that memory arrays can also perform Boolean PUM operations, albeit with very different supporting hardware and electrical signals than analog PUM.
We propose DARTH-PUM, a general-purpose hybrid PUM architecture that tackles key hardware and software challenges to integrating analog PUM and digital PUM. We propose optimized peripheral circuitry, coordinating hardware to manage and interface between both types of PUM, an easy-to-use programming interface, and low-cost support for flexible data widths. These design elements allow us to build a practical PUM architecture that can execute kernels fully in memory, and can scale easily to cater to domains ranging from embedded applications to large-scale data-driven computing. We show how three popular applications (AES encryption, convolutional neural networks, large-language models) can map to and benefit from DARTH-PUM, with speedups of 59.4x, 14.8x, and 40.8x over an analog+CPU baseline.