Taking Cryptography Out of the Data Path via Near-Memory Processing in DRAM

๐Ÿ“… 2026-05-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

261K/year
๐Ÿค– AI Summary
Traditional processors face significant latency and energy overhead when executing cryptographic algorithms such as AES-128 and SHA-256 due to the memory wall. This work presents the first implementation and systematic evaluation of in-DRAM parallel computation for these algorithms on a real multi-Rank UPMEM Processing-in-Memory (PIM) system. While the performance of a single Rank lags behind that of a conventional CPU, fully leveraging all available Ranks yields substantially higher overall throughput for cryptographic workloads. The results demonstrate that PIM architectures can significantly accelerate encryption tasks in practical deployments, highlighting their scalability and potential for energy-efficient cryptographic processing.
๐Ÿ“ Abstract
Cryptographic algorithms such as AES-128 and SHA-256 are fundamental to ensuring data security and integrity. Although these algorithms are computationally efficient, their performance is often constrained by the processor-centric architectures (e.g., CPUs, GPUs), primarily due to the memory bottleneck. This constraint leads to increased latency and higher energy consumption, particularly when handling large volumes of data. To overcome these challenges, Processing-in-Memory (PIM) has emerged as a promising architectural paradigm, allowing computation to occur directly within or near memory units. By minimizing data movement between the processor and memory units, PIM can significantly accelerate cryptographic algorithms while improving energy efficiency. Several pieces of prior work have demonstrated the effectiveness of PIM at fundamentally accelerating cryptographic algorithms. However, none of the prior works have extensively demonstrated the potential of a real-world PIM system. In this paper, we want to investigate the potential and limitations of real-world PIM in accelerating cryptographic algorithms. As part of our methodology, the UPMEM PIM architecture is used to assess the scalability of cryptographic algorithms. When these algorithms operate on a single rank, their performance remains below that of modern CPUs. However, distributing the computation across multiple ranks significantly enhances performance. When all available ranks are utilized, real-world PIM can accelerate cryptographic algorithms more effectively.
Problem

Research questions and friction points this paper is trying to address.

cryptographic algorithms
memory bottleneck
Processing-in-Memory
data movement
energy efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Processing-in-Memory
Near-Memory Computing
Cryptographic Acceleration
UPMEM
Memory Bottleneck
๐Ÿ”Ž Similar Papers