Revelator: Rapid Data Fetching via OS-Driven Hash-based Speculative Address Translation

📅 2025-08-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Address translation remains a critical performance bottleneck in modern systems, primarily due to the unpredictable nature of virtual-to-physical address (VA→PA) mappings. Existing solutions either rely on large pages or contiguous mappings—whose availability cannot be guaranteed—or incur prohibitive hardware modification costs with marginal gains. This paper proposes a software–hardware co-designed predictive address translation mechanism: the OS implements hash-driven memory allocation to establish predictable VA–PA mappings, while the hardware integrates a lightweight speculative engine enabling concurrent prefetching of data and multi-level page table entries. Crucially, our approach eliminates dependence on large pages or memory contiguity assumptions, substantially reducing TLB miss penalties. Evaluated across 11 data-intensive benchmarks, it achieves 27% average speedup in native execution and 20% in virtualized environments, with a 9% improvement in energy efficiency. RTL validation confirms minimal hardware overhead.

Technology Category

Application Category

📝 Abstract
Address translation is a major performance bottleneck in modern computing systems. Speculative address translation can hide this latency by predicting the physical address (PA) of requested data early in the pipeline. However, predicting the PA from the virtual address (VA) is difficult due to the unpredictability of VA-to-PA mappings in conventional OSes. Prior works try to overcome this but face two key issues: (i) reliance on large pages or VA-to-PA contiguity, which is not guaranteed, and (ii) costly hardware changes to store speculation metadata with limited effectiveness. We introduce Revelator, a hardware-OS cooperative scheme enabling highly accurate speculative address translation with minimal modifications. Revelator employs a tiered hash-based allocation strategy in the OS to create predictable VA-to-PA mappings, falling back to conventional allocation when needed. On a TLB miss, a lightweight speculation engine, guided by this policy, generates candidate PAs for both program data and last-level page table entries (PTEs). Thus, Revelator (i) speculatively fetches requested data before translation resolves, reducing access latency, and (ii) fetches the fourth-level PTE before the third-level PTE is accessed, accelerating page table walks. We prototype Revelator's OS support in Linux and evaluate it in simulation across 11 diverse, data-intensive benchmarks in native and virtualized environments. Revelator achieves average speedups of 27% (20%) in native (virtualized) settings, surpasses a state-of-the-art speculative mechanism by 5%, and reduces energy use by 9% compared to baseline. Our RTL prototype shows minimal area and power overheads on a modern CPU.
Problem

Research questions and friction points this paper is trying to address.

Reducing address translation latency in modern computing systems
Overcoming unpredictability of VA-to-PA mappings in conventional OSes
Minimizing hardware changes for effective speculative address translation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hardware-OS cooperative scheme for address translation
Tiered hash-based allocation for predictable mappings
Lightweight speculation engine for data and PTEs
🔎 Similar Papers
No similar papers found.
Konstantinos Kanellopoulos
Konstantinos Kanellopoulos
ETH Zurich
K
Konstantinos Sgouras
ETH Zürich
Andreas Kosmas Kakolyris
Andreas Kosmas Kakolyris
ETH Zürich
Computer Architecture
V
Vlad-Petru Nitu
ETH Zürich
B
Berkin Kerim Konar
ETH Zürich
Rahul Bera
Rahul Bera
ETH Zurich
Computer ArchitectureMicroarchitectureMemory SystemsPrefetching
O
Onur Mutlu
ETH Zürich