LEGO: Layout Expression for Generating One-to-one Mapping

๐Ÿ“… 2025-05-12
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the optimization limitations imposed by tight coupling between data layout and computation on GPUs, this paper proposes a layout-agnostic computational abstraction paradigm: computations are first expressed in a layout-decoupled form, and hierarchical parallel index expressions are then automatically derived via layout specifications. The core contribution is the first end-to-end โ€œlayout โ†’ index expressionโ€ mapping mechanism, enabling layout-driven code generation and cross-compiler optimization exploration. We design a custom layout specification language and integrate it with MLIR, Triton, and CUDA templates to build an index derivation engine. Experimental evaluation demonstrates that the generated code achieves performance on par with hand-optimized Triton kernels. Furthermore, the approach is validated for generality and efficiency across both MLIR-based and CUDA-based compilation ecosystems.

Technology Category

Application Category

๐Ÿ“ Abstract
We describe LEGO, a new approach to optimizing data movement whereby code is expressed as a layout-independent computation and composed with layouts for data and computation. This code generator organization derives complex indexing expressions associated with hierarchical parallel code and data movement for GPUs. LEGO maps from layout specification to indexing expressions, and can be integrated into existing compilers and code templates. It facilitates the exploration of data layouts in combination with other optimizations. We demonstrate LEGO's integration with the MLIR and Triton compilers, and with CUDA templates. We show that LEGO is capable of deriving performance competitive with Triton, and shows broad applicability in its integration with MLIR and CUDA.
Problem

Research questions and friction points this paper is trying to address.

Optimizing data movement via layout-independent computation
Generating complex indexing for hierarchical GPU code
Exploring data layouts with compiler integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Layout-independent computation with data layouts
Generates complex GPU indexing expressions
Integrates with MLIR, Triton, and CUDA
๐Ÿ”Ž Similar Papers