🤖 AI Summary
Conventional architectures struggle to support large-scale linear optimization, and existing algorithms are poorly suited for resistive random-access memory (RRAM)-based in-memory computing (IMC).
Method: This paper proposes an algorithm–hardware co-designed distributed IMC framework. It is the first to implement the primal-dual hybrid gradient (PDHG) algorithm on RRAM, employing a symmetric block-matrix structure to unify distributed crossbar operations. This design significantly reduces write overhead from frequent reprogramming and enhances robustness against device non-idealities. Leveraging the MELISO+ physical-level simulation framework, the approach integrates analog-domain matrix computation with robust optimization techniques.
Results: On large-scale linear programming tasks, the framework achieves solution accuracy comparable to GPU-based solvers, while delivering up to three orders-of-magnitude reduction in energy consumption and latency. These results demonstrate both the feasibility and superiority of IMC for solving large-scale optimization problems.
📝 Abstract
The exponential growth of computational workloads is surpassing the capabilities of conventional architectures, which are constrained by fundamental limits. In-memory computing (IMC) with RRAM provides a promising alternative by providing analog computations with significant gains in latency and energy use. However, existing algorithms developed for conventional architectures do not translate to IMC, particularly for constrained optimization problems where frequent matrix reprogramming remains cost-prohibitive for IMC applications. Here we present a distributed in-memory primal-dual hybrid gradient (PDHG) method, specifically co-designed for arrays of RRAM devices. Our approach minimizes costly write cycles, incorporates robustness against device non-idealities, and leverages a symmetric block-matrix formulation to unify operations across distributed crossbars. We integrate a physics-based simulation framework called MELISO+ to evaluate performance under realistic device conditions. Benchmarking against GPU-accelerated solvers on large-scale linear programs demonstrates that our RRAM-based solver achieves comparable accuracy with up to three orders of magnitude reductions in energy consumption and latency. These results demonstrate the first PDHG-based LP solver implemented on RRAMs, showcasing the transformative potential of algorithm-hardware co-design for solving large-scale optimization through distributed in-memory computing.