🤖 AI Summary
Existing user-schedulable languages (USLs) struggle to reconcile fine-grained programmer control over scheduling logic with automated performance optimization. This paper proposes the Growable User-Schedulable Language (GUSL), enabling programmers to define novel scheduling operations externally to the compiler and compose trusted, fine-grained primitives into customizable scheduling libraries. We introduce Cursors—a novel mechanism that uniformly models three core extensibility elements: actions, predicates, and references—thereby achieving, for the first time, safe, user-driven language growth. Our approach integrates fine-grained primitive composition, Cursor abstraction, user-defined scheduling libraries, program transformation, and static analysis. Evaluation across 80+ high-performance kernels shows that GUSL reduces scheduling code size by an order of magnitude while matching the performance of hand-tuned implementations across x86, ARM, and RISC-V architectures.
📝 Abstract
User-schedulable languages (USLs) help programmers productively optimize programs by providing safe means of transforming them. Current USLs are designed to give programmers exactly the control they want, while automating all other concerns. However, there is no universal answer for what performance-conscious programmers want to control, how they want to control it, and what they want to automate, even in relatively narrow domains. We claim that USLs should, instead, be designed to grow. We present Exo 2, a scheduling language that enables users to define new scheduling operations externally to the compiler. By composing a set of trusted, fine-grained primitives, users can safely write their own scheduling library to build up desired automation. We identify actions (ways of modifying code), inspection (ways of interrogating code), and references (ways of pointing to code) as essential for any user-extensible USL. We fuse these ideas into a new mechanism called Cursors that enables the creation of scheduling libraries in user code. We demonstrate libraries that amortize scheduling effort across more than 80 high-performance kernels, reducing total scheduling code by an order of magnitude and delivering performance competitive with state-of-the-art implementations on three different platforms.