🤖 AI Summary
Existing memory tiering systems rely on manually tuned, fixed thresholds, making it difficult to sustain high performance across diverse workloads. This paper proposes a parameter-free adaptive memory tiering mechanism. First, it designs short- and long-term moving-average hot-page detectors to dynamically classify page temperature. Second, it formulates a cost-benefit–driven migration decision model that eliminates threshold dependency. Third, it introduces a bandwidth-aware batch scheduling policy to improve I/O efficiency. Deeply integrated into the kernel’s memory management subsystem, the solution requires no configuration and delivers out-of-the-box usability. Evaluated across multiple workload classes, it achieves over 97% of the performance of the best manually tuned baseline, while outperforming the unoptimized baseline by 1.26×–2.3×. The approach significantly enhances robustness and practicality in real-world deployment.
📝 Abstract
Memory tiering systems seek cost-effective memory scaling by adding multiple tiers of memory. For maximum performance, frequently accessed (hot) data must be placed close to the host in faster tiers and infrequently accessed (cold) data can be placed in farther slower memory tiers. Existing tiering solutions such as HeMem, Memtis, and TPP use rigid policies with pre-configured thresholds to make data placement and migration decisions. We perform a thorough evaluation of the threshold choices and show that there is no single set of thresholds that perform well for all workloads and configurations, and that tuning can provide substantial speedups. Our evaluation identified three primary reasons why tuning helps: better hot/cold page identification, reduced wasteful migrations, and more timely migrations.
Based on this study, we designed ARMS - Adaptive and Robust Memory tiering System - to provide high performance without tunable thresholds. We develop a novel hot/cold page identification mechanism relying on short-term and long-term moving averages, an adaptive migration policy based on cost/benefit analysis, and a bandwidth-aware batched migration scheduler. Combined, these approaches provide out-of-the-box performance within 3% the best tuned performance of prior systems, and between 1.26x-2.3x better than prior systems without tuning.