๐ค AI Summary
In remote memory architectures, cache misses incur high latency, and existing runtime prefetching techniques struggle to balance accuracy and overhead. This paper proposes FarSight, a kernel-level real-time prefetching system. FarSight introduces the first ordinal modeling approach that decouples semantic meaning from memory layout, enabling asynchronous GPU/CPU inference and multi-step forward prediction. It employs lightweight LSTM or Transformer models deployed online within cache-resident environments, achieving high-accuracy prefetching with minimal overhead (<1% CPU utilization). Evaluated on four data-intensive workloads, FarSight improves performance by up to 3.6ร over the state-of-the-art remote memory systems. It significantly advances runtime deep learningโbased prefetching by overcoming key bottlenecks in generalizability, strict latency constraints, and seamless system integration.
๐ Abstract
Modern software systems face increasing runtime performance demands, particularly in emerging architectures like far memory, where local-memory misses incur significant latency. While machine learning (ML) has proven effective in offline systems optimization, its application to high-frequency, runtime-level problems remains limited due to strict performance, generalization, and integration constraints. We present FarSight, a Linux-based far-memory system that leverages deep learning (DL) to efficiently perform accurate data prefetching. FarSight separates application semantics from runtime memory layout, allowing offline-trained DL models to predict access patterns using a compact vocabulary of ordinal possibilities, resolved at runtime through lightweight mapping structures. By combining asynchronous inference, lookahead prediction, and a cache-resident DL model, FarSight achieves high prediction accuracy with low runtime overhead. Our evaluation of FarSight on four data-intensive workloads shows that it outperforms the state-of-the-art far-memory system by up to 3.6 times. Overall, this work demonstrates the feasibility and advantages of applying modern ML techniques to complex, performance-critical software runtime problems.