Deep-Learning-Driven Prefetching for Far Memory

๐Ÿ“… 2025-05-31
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In remote memory architectures, cache misses incur high latency, and existing runtime prefetching techniques struggle to balance accuracy and overhead. This paper proposes FarSight, a kernel-level real-time prefetching system. FarSight introduces the first ordinal modeling approach that decouples semantic meaning from memory layout, enabling asynchronous GPU/CPU inference and multi-step forward prediction. It employs lightweight LSTM or Transformer models deployed online within cache-resident environments, achieving high-accuracy prefetching with minimal overhead (<1% CPU utilization). Evaluated on four data-intensive workloads, FarSight improves performance by up to 3.6ร— over the state-of-the-art remote memory systems. It significantly advances runtime deep learningโ€“based prefetching by overcoming key bottlenecks in generalizability, strict latency constraints, and seamless system integration.

Technology Category

Application Category

๐Ÿ“ Abstract
Modern software systems face increasing runtime performance demands, particularly in emerging architectures like far memory, where local-memory misses incur significant latency. While machine learning (ML) has proven effective in offline systems optimization, its application to high-frequency, runtime-level problems remains limited due to strict performance, generalization, and integration constraints. We present FarSight, a Linux-based far-memory system that leverages deep learning (DL) to efficiently perform accurate data prefetching. FarSight separates application semantics from runtime memory layout, allowing offline-trained DL models to predict access patterns using a compact vocabulary of ordinal possibilities, resolved at runtime through lightweight mapping structures. By combining asynchronous inference, lookahead prediction, and a cache-resident DL model, FarSight achieves high prediction accuracy with low runtime overhead. Our evaluation of FarSight on four data-intensive workloads shows that it outperforms the state-of-the-art far-memory system by up to 3.6 times. Overall, this work demonstrates the feasibility and advantages of applying modern ML techniques to complex, performance-critical software runtime problems.
Problem

Research questions and friction points this paper is trying to address.

Reducing latency in far-memory systems with deep learning prefetching
Applying ML to high-frequency runtime problems with low overhead
Improving prediction accuracy for data access patterns in far memory
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep learning for far-memory prefetching
Offline-trained models with runtime mapping
Asynchronous inference with low overhead
๐Ÿ”Ž Similar Papers
No similar papers found.
Yutong Huang
Yutong Huang
University of California, San Diego
Operating System
Zhiyuan Guo
Zhiyuan Guo
University of California San Diego
Y
Yiying Zhang
University of California, San Diego, San Diego, CA, USA, GenseeAI Inc., San Diego, CA, USA