ReCross: Efficient Embedding Reduction Scheme for In-Memory Computing using ReRAM-Based Crossbar

📅 2025-09-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address memory bandwidth bottlenecks and high energy consumption caused by large sparse embedding layers in deep learning recommendation models (DLRMs), this paper proposes an efficient in-memory computing architecture tailored for ReRAM crossbar arrays. The method introduces three key innovations: (1) a co-occurrence-aware embedding grouping and mapping strategy to improve crossbar utilization; (2) a redundancy-based replication scheme for high-frequency embeddings across multiple arrays to mitigate access conflicts; and (3) runtime-selectable in-memory computation modes coupled with a configurable switched-capacitor ADC, enabling adaptive energy–latency trade-offs. Experimental evaluation demonstrates that, compared to the state-of-the-art in-memory approaches, the proposed design achieves a 3.97× reduction in inference latency and a 6.1× improvement in energy efficiency—significantly advancing the practical deployment of sparse embedding acceleration.

Technology Category

Application Category

📝 Abstract
Deep learning-based recommendation models (DLRMs) are widely deployed in commercial applications to enhance user experience. However, the large and sparse embedding layers in these models impose substantial memory bandwidth bottlenecks due to high memory access costs and irregular access patterns, leading to increased inference time and energy consumption. While resistive random access memory (ReRAM) based crossbars offer a fast and energy-efficient solution through in-memory embedding reduction operations, naively mapping embeddings onto crossbar arrays leads to poor crossbar utilization and thus degrades performance. We present ReCross, an efficient ReRAM-based in-memory computing (IMC) scheme designed to minimize execution time and enhance energy efficiency in DLRM embedding reduction. ReCross co-optimizes embedding access patterns and ReRAM crossbar characteristics by intelligently grouping and mapping co-occurring embeddings, replicating frequently accessed embeddings across crossbars, and dynamically selecting in-memory processing operations using a newly designed dynamic switch ADC circuit that considers runtime energy trade-offs. Experimental results demonstrate that ReCross achieves a 3.97x reduction in execution time and a 6.1x improvement in energy efficiency compared to state-of-the-art IMC approaches.
Problem

Research questions and friction points this paper is trying to address.

Addresses memory bandwidth bottlenecks in deep learning recommendation models
Optimizes embedding mapping for ReRAM crossbars to improve utilization
Reduces inference time and energy consumption via in-memory computing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Grouping co-occurring embeddings for efficiency
Replicating frequent embeddings across crossbars
Dynamic switch ADC for energy optimization
🔎 Similar Papers
No similar papers found.
Y
Yu-Hong Lai
National Taiwan University, Taipei, Taiwan
C
Chieh-Lin Tsai
National Taiwan University, Taipei, Taiwan
W
Wen Sheng Lim
National Taiwan University, Taipei, Taiwan
H
Han-Wen Hu
Macronix International Co., Ltd., Hsinchu, Taiwan
T
Tei-Wei Kuo
National Taiwan University, Taipei, Taiwan; Delta Electronics, Taipei, Taiwan
Yuan-Hao Chang
Yuan-Hao Chang
Professor, Dept. of CSIE, National Taiwan University; IEEE Fellow
Comuter SystemComputer ArchitectureEmbedded SystemOperating SystemNon-volatile Memory