🤖 AI Summary
Traditional empirical models struggle to accurately predict cache behavior for affine loop nests in scientific computing and tensor operations. This work proposes the first fully symbolic locality theory, modeling data locality as polynomials in input sizes and cache parameters. By integrating affine loop analysis, virtual reuse modeling, and compiler infrastructure, the approach derives quadratic and reciprocal scaling laws for cache misses. It transcends the limitations of empirical rules, enabling high-precision prediction of data movement across arbitrary input scales and cache configurations. Evaluation on 41 scientific kernels and tensor programs shows that locality polynomials are generated in an average of 41 seconds, with cache miss predictions taking under 1 millisecond and achieving 99.6% accuracy.
📝 Abstract
This paper presents a new theory of locality and its compiler support. The theory is fully symbolic and derives locality as polynomials, and the compiler analysis supports affine loop nests. They derive cache-performance scaling in quadratic and reciprocal expressions and are more general and precise than empirical scaling rules. Evaluated on a benchmark suite of 41 scientific kernels and tensor operations, the compiler requires an average of 41 seconds to derive the locality polynomials. After derivation, predicting the cache miss count for any given input size and cache configuration takes less than a millisecond. Across all tests--with and without loop fusion--the accuracy in the data movement prediction is 99.6\%, compared to simulated set-associative L1 data cache.