Deep-Learning-Driven Prefetching for Far Memory

📅 2025-05-31

📈 Citations: 0

✨ Influential: 0

career value

274K/year

🤖 AI Summary

In remote memory architectures, cache misses incur high latency, and existing runtime prefetching techniques struggle to balance accuracy and overhead. This paper proposes FarSight, a kernel-level real-time prefetching system. FarSight introduces the first ordinal modeling approach that decouples semantic meaning from memory layout, enabling asynchronous GPU/CPU inference and multi-step forward prediction. It employs lightweight LSTM or Transformer models deployed online within cache-resident environments, achieving high-accuracy prefetching with minimal overhead (<1% CPU utilization). Evaluated on four data-intensive workloads, FarSight improves performance by up to 3.6× over the state-of-the-art remote memory systems. It significantly advances runtime deep learning–based prefetching by overcoming key bottlenecks in generalizability, strict latency constraints, and seamless system integration.

Technology Category

Application Category

📝 Abstract

Modern software systems face increasing runtime performance demands, particularly in emerging architectures like far memory, where local-memory misses incur significant latency. While machine learning (ML) has proven effective in offline systems optimization, its application to high-frequency, runtime-level problems remains limited due to strict performance, generalization, and integration constraints. We present FarSight, a Linux-based far-memory system that leverages deep learning (DL) to efficiently perform accurate data prefetching. FarSight separates application semantics from runtime memory layout, allowing offline-trained DL models to predict access patterns using a compact vocabulary of ordinal possibilities, resolved at runtime through lightweight mapping structures. By combining asynchronous inference, lookahead prediction, and a cache-resident DL model, FarSight achieves high prediction accuracy with low runtime overhead. Our evaluation of FarSight on four data-intensive workloads shows that it outperforms the state-of-the-art far-memory system by up to 3.6 times. Overall, this work demonstrates the feasibility and advantages of applying modern ML techniques to complex, performance-critical software runtime problems.

Problem

Research questions and friction points this paper is trying to address.

Reducing latency in far-memory systems with deep learning prefetching

Applying ML to high-frequency runtime problems with low overhead

Improving prediction accuracy for data access patterns in far memory

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning for far-memory prefetching

Offline-trained models with runtime mapping

Asynchronous inference with low overhead

🔎 Similar Papers

SSDTrain: An Activation Offloading Framework to SSDs for Faster Large Language Model Training