🤖 AI Summary
This work addresses the performance bottlenecks in AI and high-performance computing caused by data movement in loop programs by proposing a domain-specific language (DSL)-based approach for automated locality analysis. The method formalizes affine loop nests as polyhedral sets and maps, enabling, for the first time, fully symbolic derivation of reuse distance and data movement complexity without relying on traditional techniques such as stack simulation or recursive working-set models. Implemented in Rust, the system integrates Barvinok counting, the polyhedral model, and affine transformations, and provides both a command-line tool and an interactive web platform. It enables precise locality analysis for representative operators including tensor contractions, einsum expressions, and stencil computations.
📝 Abstract
Data movement is the primary bottleneck in modern computing systems. For loop-based programs common in high-performance computing (HPC) and AI workloads, including matrix multiplication, tensor contraction, stencil computation, and einsum operations, the cost of moving data through the memory hierarchy often exceeds the cost of arithmetic.
This paper presents AutoLALA, an open-source tool that analyzes data locality in affine loop programs. The tool accepts programs written in a small domain-specific language (DSL), lowers them to polyhedral sets and maps, and produces closed-form symbolic formulas for reuse distance and data movement complexity. AutoLALA implements the fully symbolic locality analysis of Zhu et al. together with the data movement distance (DMD) framework of Smith et al. In particular, it computes reuse distance as the image of the access space under the access map, avoiding both stack simulation and Denning's recursive working-set formulation.
We describe the DSL syntax and its formal semantics, the polyhedral lowering pipeline that constructs timestamp spaces and access maps via affine transformations, and the sequence of Barvinok counting operations used to derive symbolic reuse-interval and reuse-distance distributions. The system is implemented in Rust as a modular library spanning three crates, with safe bindings to the Barvinok library.
We provide both a command-line interface and an interactive web playground with LaTeX rendering of the output formulas. The tool handles arbitrary affine loop nests, covering workloads such as tensor contractions, einsum expressions, stencil computations, and general polyhedral programs.