Axe: A Simple Unified Layout Abstraction for Machine Learning Compilers

📅 2026-01-27

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Modern deep learning compilers lack a unified abstraction for jointly expressing tensor layouts across device distributions and intra-device memory hierarchies, hindering effective optimization. This work proposes the Axe layout abstraction, which maps logical tensor coordinates to a multi-axis physical space through named axes, thereby unifying support for tiling, sharding, replication, and offsetting. Building upon this abstraction, the authors design a multi-granularity, distribution-aware domain-specific language (DSL) and a hardware-aware compiler that integrates fine-grained thread-level control with collective communication primitives. Their approach achieves, for the first time, a unified layout representation spanning from device grids down to thread level, significantly improving both the performance—approaching that of hand-tuned implementations—and portability of the generated code.

Technology Category

Application Category

📝 Abstract

Scaling modern deep learning workloads demands coordinated placement of data and compute across device meshes, memory hierarchies, and heterogeneous accelerators. We present Axe Layout, a hardware-aware abstraction that maps logical tensor coordinates to a multi-axis physical space via named axes. Axe unifies tiling, sharding, replication, and offsets across inter-device distribution and on-device layouts, enabling collective primitives to be expressed consistently from device meshes to threads. Building on Axe, we design a multi-granularity, distribution-aware DSL and compiler that composes thread-local control with collective operators in a single kernel. Experiments show that our unified approach can bring performance close to hand-tuned kernels on across latest GPU devices and multi-device environments and accelerator backends.

Problem

Research questions and friction points this paper is trying to address.

layout abstraction

device mesh

memory hierarchy

heterogeneous accelerators

tensor placement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Axe Layout

named axes

unified layout abstraction