🤖 AI Summary
RRAM-based in-memory computing (IMC) accelerators offer energy efficiency for DNNs but suffer from three key limitations: support only for static dot products, reliance on power-hungry analog-to-digital converters (ADCs), and weight mapping degradation due to device non-idealities. This paper proposes NL-DPE—the first fully analog IMC engine supporting arbitrary nonlinear functions and data-dependent multiplication. Its core innovations are: (1) an RRAM-based analog content-addressable memory (ACAM) that maps nonlinear operations onto decision trees and executes them entirely in the analog domain; (2) the first ADC-free analog nonlinear dot-product computation; and (3) a noise-aware fine-tuning (NAF) algorithm that eliminates hardware calibration while significantly enhancing robustness. Evaluations show NL-DPE achieves 28× higher energy efficiency and 249× higher throughput than GPU baselines, and outperforms state-of-the-art IMC accelerators by 22× in energy efficiency and 245× in speed—while maintaining high inference accuracy.
📝 Abstract
Resistive Random Access Memory (RRAM) based in-memory computing (IMC) accelerators offer significant performance and energy advantages for deep neural networks (DNNs), but face three major limitations: (1) they support only extit{static} dot-product operations and cannot accelerate arbitrary non-linear functions or data-dependent multiplications essential to modern LLMs; (2) they demand large, power-hungry analog-to-digital converter (ADC) circuits; and (3) mapping model weights to device conductance introduces errors from cell nonidealities. These challenges hinder scalable and accurate IMC acceleration as models grow.
We propose NL-DPE, a Non-Linear Dot Product Engine that overcomes these barriers. NL-DPE augments crosspoint arrays with RRAM-based Analog Content Addressable Memory (ACAM) to execute arbitrary non-linear functions and data-dependent matrix multiplications in the analog domain by transforming them into decision trees, fully eliminating ADCs. To address device noise, NL-DPE uses software-based Noise Aware Fine-tuning (NAF), requiring no in-device calibration. Experiments show that NL-DPE delivers 28X energy efficiency and 249X speedup over a GPU baseline, and 22X energy efficiency and 245X speedup over existing IMC accelerators, while maintaining high accuracy.