From Minutes to Seconds: Redefining the Five-Minute Rule for AI-Era Memory Hierarchies

📅 2025-11-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper revisits and updates the 1987 “Five-Minute Rule” to reflect AI-era memory hierarchies dominated by GPUs and ultra-high-performance SSDs, addressing key limitations of the original rule—namely its neglect of host cost, physical constraints, and workload dynamics. Method: We propose a dynamic resource planning framework integrating DRAM bandwidth/capacity modeling, physics-aware SSD performance modeling, and workload-aware analysis. Leveraging the MQSim-Next simulator, we conduct sensitivity analysis to quantify cache threshold shifts under AI workloads. Contribution/Results: We demonstrate that the DRAM–NAND cache threshold has collapsed from minutes to seconds in AI scenarios—a first-time quantification. We establish NAND flash as a viable *active data layer*, enabling new hardware–software co-design paradigms. Two case studies validate the framework’s ability to expand the system design space, offering principled guidance for GPU-centric storage architecture.

Technology Category

Application Category

📝 Abstract
In 1987, Jim Gray and Gianfranco Putzolu introduced the five-minute rule, a simple, storage-memory-economics-based heuristic for deciding when data should live in DRAM rather than on storage. Subsequent revisits to the rule largely retained that economics-only view, leaving host costs, feasibility limits, and workload behavior out of scope. This paper revisits the rule from first principles, integrating host costs, DRAM bandwidth/capacity, and physics-grounded models of SSD performance and cost, and then embedding these elements in a constraint- and workload-aware framework that yields actionable provisioning guidance. We show that, for modern AI platforms, especially GPU-centric hosts paired with ultra-high-IOPS SSDs engineered for fine-grained random access, the DRAM-to-flash caching threshold collapses from minutes to a few seconds. This shift reframes NAND flash memory as an active data tier and exposes a broad research space across the hardware-software stack. We further introduce MQSim-Next, a calibrated SSD simulator that supports validation and sensitivity analysis and facilitates future architectural and system research. Finally, we present two concrete case studies that showcase the software system design space opened by such memory hierarchy paradigm shift. Overall, we turn a classical heuristic into an actionable, feasibility-aware analysis and provisioning framework and set the stage for further research on AI-era memory hierarchy.
Problem

Research questions and friction points this paper is trying to address.

Redefining the five-minute rule for AI-era memory hierarchies
Integrating host costs and SSD performance into caching decisions
Collapsing DRAM-to-flash threshold from minutes to seconds
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates host costs and SSD performance models
Develops constraint-aware memory provisioning framework
Introduces calibrated SSD simulator for validation
🔎 Similar Papers
No similar papers found.
T
Tong Zhang
ScaleFlux, USA
V
Vikram Sharma Mailthody
NVIDIA, USA
F
Fei Sun
ScaleFlux, USA
Linsen Ma
Linsen Ma
Rensselaer Polytechnic Institute
C
C. Newburn
NVIDIA, USA
T
Teresa Zhang
Stanford University, USA
Y
Yang Liu
ScaleFlux, USA
J
Jiangpeng Li
ScaleFlux, USA
Hao Zhong
Hao Zhong
Professor, Shanghai Jiao Tong University
Software Engineering
W
Wen-Mei Hwu
NVIDIA, USA