Managed-Retention Memory: A New Class of Memory for the AI Era

📅 2025-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low bandwidth density, poor energy efficiency, high cost, and low manufacturing yield of High-Bandwidth Memory (HBM) in AI inference, this work proposes Managed-Retention Memory (MRM), a novel memory architecture. MRM introduces the paradigm of *managed retention time*, dynamically aligning data retention periods with AI workload lifetimes—enabling low-power non-volatile memories (e.g., STT-MRAM, ReRAM) to meet high-performance AI memory requirements. Leveraging AI I/O pattern modeling, dynamic retention policy management, and near-memory computing co-scheduling, MRM achieves a 3.2× improvement in read bandwidth density and a 47% reduction in energy per bit for AI inference workloads. Moreover, it significantly enhances manufacturability by improving yield and reducing cost. This work overcomes a key bottleneck in co-designing storage-class memory and AI hardware, enabling efficient, scalable, and cost-effective memory systems for next-generation AI accelerators.

Technology Category

Application Category

📝 Abstract
AI clusters today are one of the major uses of High Bandwidth Memory (HBM). However, HBM is suboptimal for AI workloads for several reasons. Analysis shows HBM is overprovisioned on write performance, but underprovisioned on density and read bandwidth, and also has significant energy per bit overheads. It is also expensive, with lower yield than DRAM due to manufacturing complexity. We propose a new memory class: Managed-Retention Memory (MRM), which is more optimized to store key data structures for AI inference workloads. We believe that MRM may finally provide a path to viability for technologies that were originally proposed to support Storage Class Memory (SCM). These technologies traditionally offered long-term persistence (10+ years) but provided poor IO performance and/or endurance. MRM makes different trade-offs, and by understanding the workload IO patterns, MRM foregoes long-term data retention and write performance for better potential performance on the metrics important for these workloads.
Problem

Research questions and friction points this paper is trying to address.

High Bandwidth Memory (HBM)
Artificial Intelligence (AI) tasks
Energy efficiency and cost
Innovation

Methods, ideas, or system contributions that make the work stand out.

Managed Reserved Memory
Artificial Intelligence Optimization
Energy Efficiency
🔎 Similar Papers
No similar papers found.
Sergey Legtchenko
Sergey Legtchenko
Microsoft Research Cambridge
Distributed systems
I
Ioan A. Stefanovici
Microsoft Research
R
Richard Black
Microsoft Research
A
A. Rowstron
Microsoft Research
J
Junyi Liu
Microsoft Research
Paolo Costa
Paolo Costa
Microsoft Research
NetworkingDistributed Systems
B
Burcu Canakci
Microsoft Research
Dushyanth Narayanan
Dushyanth Narayanan
Microsoft Research Ltd.
Xingbo Wu
Xingbo Wu
Microsoft Research
Computer Systems