NL-DPE: An Analog In-memory Non-Linear Dot Product Engine for Efficient CNN and LLM Inference

📅 2025-11-17

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

RRAM-based in-memory computing (IMC) accelerators offer energy efficiency for DNNs but suffer from three key limitations: support only for static dot products, reliance on power-hungry analog-to-digital converters (ADCs), and weight mapping degradation due to device non-idealities. This paper proposes NL-DPE—the first fully analog IMC engine supporting arbitrary nonlinear functions and data-dependent multiplication. Its core innovations are: (1) an RRAM-based analog content-addressable memory (ACAM) that maps nonlinear operations onto decision trees and executes them entirely in the analog domain; (2) the first ADC-free analog nonlinear dot-product computation; and (3) a noise-aware fine-tuning (NAF) algorithm that eliminates hardware calibration while significantly enhancing robustness. Evaluations show NL-DPE achieves 28× higher energy efficiency and 249× higher throughput than GPU baselines, and outperforms state-of-the-art IMC accelerators by 22× in energy efficiency and 245× in speed—while maintaining high inference accuracy.

Technology Category

Application Category

📝 Abstract

Resistive Random Access Memory (RRAM) based in-memory computing (IMC) accelerators offer significant performance and energy advantages for deep neural networks (DNNs), but face three major limitations: (1) they support only extit{static} dot-product operations and cannot accelerate arbitrary non-linear functions or data-dependent multiplications essential to modern LLMs; (2) they demand large, power-hungry analog-to-digital converter (ADC) circuits; and (3) mapping model weights to device conductance introduces errors from cell nonidealities. These challenges hinder scalable and accurate IMC acceleration as models grow. We propose NL-DPE, a Non-Linear Dot Product Engine that overcomes these barriers. NL-DPE augments crosspoint arrays with RRAM-based Analog Content Addressable Memory (ACAM) to execute arbitrary non-linear functions and data-dependent matrix multiplications in the analog domain by transforming them into decision trees, fully eliminating ADCs. To address device noise, NL-DPE uses software-based Noise Aware Fine-tuning (NAF), requiring no in-device calibration. Experiments show that NL-DPE delivers 28X energy efficiency and 249X speedup over a GPU baseline, and 22X energy efficiency and 245X speedup over existing IMC accelerators, while maintaining high accuracy.

Problem

Research questions and friction points this paper is trying to address.

Accelerating non-linear functions and data-dependent multiplications in modern LLMs

Eliminating power-hungry analog-to-digital converter circuits in IMC accelerators

Addressing device noise and nonidealities in resistive memory mapping

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses RRAM-based ACAM for non-linear analog functions

Eliminates ADCs by transforming operations to decision trees

Employs Noise Aware Fine-tuning for device noise compensation

🔎 Similar Papers

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval