🤖 AI Summary
Traditional reuse distance histogram (RDH) computation relies on dynamic execution and memory tracing, incurring high overhead and hindering early-stage compiler optimizations; existing static approaches, while efficient, suffer from low accuracy—particularly for complex reuse patterns in nested-loop array accesses. Method: We propose a static RDH prediction framework tailored to nested-loop array access, integrating loop-bound analysis, fine-grained access pattern extraction, and closed-form modeling to estimate reuse distance distributions and cache hit rates at compile time. Contribution/Results: Our method is the first to systematically capture cross-iteration and cross-dimensional reuse across multiple loop levels, significantly improving the completeness and accuracy of static RDH prediction. Experiments show that our approach achieves 3–4 orders of magnitude faster analysis than the trace-driven tool PARDA, with prediction error bounded within 5%, enabling scalable program performance modeling and practical compile-time optimization.
📝 Abstract
Efficient memory access patterns play a crucial role in determining the overall performance of applications by exploiting temporal and spatial locality, thus maximizing cache locality. The Reuse Distance Histogram (RDH) is a widely used metric to quantify temporal locality, measuring the distance between consecutive accesses to the same memory location. Traditionally, calculating RDH requires program execution and memory trace collection to obtain dynamic memory access behavior. This trace collection is often time-consuming, resource-intensive, and unsuitable for early-stage optimization or large-scale applications. Static prediction, on the other hand, offers a significant speedup in estimating RDH and cache hit rates. However, these approaches lack accuracy, since the predictions come without running the program and knowing the complete memory access pattern, more specifically when arrays are used inside nested loops. This paper presents a novel static analysis framework for predicting the reuse profiles of array references in programs with nested loop structures, without requiring any runtime information. By analyzing loop bounds, access patterns in smaller problem sizes, and predictive equations, our method predicts access patterns of arrays and estimates reuse distances and cache hit rate at compile time. This paper extends our previous study by incorporating more analysis and improving prediction by addressing previously unhandled reuse patterns. We evaluate our technique against a widely accepted traditional trace-driven profiling tool, Parallel Reuse Distance Analysis (PARDA). The results demonstrate that our static predictor achieves comparable accuracy while offering orders-of-magnitude improvement in the analysis speed. This work offers a practical alternative to dynamic reuse profiling and paves the way for integration into compilers and static performance modeling tools.