NL-DPE: An Analog In-memory Non-Linear Dot Product Engine for Efficient CNN and LLM Inference

📅 2025-11-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
RRAM-based in-memory computing (IMC) accelerators offer energy efficiency for DNNs but suffer from three key limitations: support only for static dot products, reliance on power-hungry analog-to-digital converters (ADCs), and weight mapping degradation due to device non-idealities. This paper proposes NL-DPE—the first fully analog IMC engine supporting arbitrary nonlinear functions and data-dependent multiplication. Its core innovations are: (1) an RRAM-based analog content-addressable memory (ACAM) that maps nonlinear operations onto decision trees and executes them entirely in the analog domain; (2) the first ADC-free analog nonlinear dot-product computation; and (3) a noise-aware fine-tuning (NAF) algorithm that eliminates hardware calibration while significantly enhancing robustness. Evaluations show NL-DPE achieves 28× higher energy efficiency and 249× higher throughput than GPU baselines, and outperforms state-of-the-art IMC accelerators by 22× in energy efficiency and 245× in speed—while maintaining high inference accuracy.

Technology Category

Application Category

📝 Abstract
Resistive Random Access Memory (RRAM) based in-memory computing (IMC) accelerators offer significant performance and energy advantages for deep neural networks (DNNs), but face three major limitations: (1) they support only extit{static} dot-product operations and cannot accelerate arbitrary non-linear functions or data-dependent multiplications essential to modern LLMs; (2) they demand large, power-hungry analog-to-digital converter (ADC) circuits; and (3) mapping model weights to device conductance introduces errors from cell nonidealities. These challenges hinder scalable and accurate IMC acceleration as models grow. We propose NL-DPE, a Non-Linear Dot Product Engine that overcomes these barriers. NL-DPE augments crosspoint arrays with RRAM-based Analog Content Addressable Memory (ACAM) to execute arbitrary non-linear functions and data-dependent matrix multiplications in the analog domain by transforming them into decision trees, fully eliminating ADCs. To address device noise, NL-DPE uses software-based Noise Aware Fine-tuning (NAF), requiring no in-device calibration. Experiments show that NL-DPE delivers 28X energy efficiency and 249X speedup over a GPU baseline, and 22X energy efficiency and 245X speedup over existing IMC accelerators, while maintaining high accuracy.
Problem

Research questions and friction points this paper is trying to address.

Accelerating non-linear functions and data-dependent multiplications in modern LLMs
Eliminating power-hungry analog-to-digital converter circuits in IMC accelerators
Addressing device noise and nonidealities in resistive memory mapping
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses RRAM-based ACAM for non-linear analog functions
Eliminates ADCs by transforming operations to decision trees
Employs Noise Aware Fine-tuning for device noise compensation
L
Lei Zhao
Artificial Intelligence Research Lab (AIRL), Hewlett Packard Labs, USA
L
Luca Buonanno
Artificial Intelligence Research Lab (AIRL), Hewlett Packard Labs, USA
Archit Gajjar
Archit Gajjar
Artificial Intelligence Research Lab (AIRL), Hewlett Packard Labs, USA
J
John Moon
Artificial Intelligence Research Lab (AIRL), Hewlett Packard Labs, USA
A
Aishwarya Natarajan
Artificial Intelligence Research Lab (AIRL), Hewlett Packard Labs, USA
Sergey Serebryakov
Sergey Serebryakov
Artificial Intelligence Research Lab (AIRL), Hewlett Packard Labs, USA
R
Ron M. Roth
Technion - Israel Institute of Technology
X
Xia Sheng
Artificial Intelligence Research Lab (AIRL), Hewlett Packard Labs, USA
Youtao Zhang
Youtao Zhang
University of Pittsburgh, Pittsburgh, PA, USA
Paolo Faraboschi
Paolo Faraboschi
Hewlett Packard Labs
exascaleSoCvliwcomputer architectureAI and machine learning
J
Jim Ignowski
Artificial Intelligence Research Lab (AIRL), Hewlett Packard Labs, USA
Giacomo Pedretti
Giacomo Pedretti
Research Scientist, Hewlett Packard Laboratories
AI acceleratorsIn-memory computingNeuromorphic ComputingAnalog computingEmerging memories