Scalable Packed Layouts for Vector-Length-Agnostic ML Code Generation

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work addresses the challenge of efficiently generating vector-length-agnostic (VLA) machine learning code for scalable vector instruction sets such as Arm SVE, where unknown vector lengths at compile time hinder traditional compilers. The authors present the first end-to-end VLA support in MLIR/IREE, introducing a vector-length-aware compact data layout and unifying dynamic tiling, operator fusion, and scalable vectorization within a single compilation framework. Evaluated on Arm CPUs, the generated SVE code achieves up to 1.45× speedup over IREE’s NEON implementation, outperforms multiple frameworks in the PyTorch ecosystem, and demonstrates strong scalability with increasing vector lengths in simulation, effectively balancing performance and hardware portability.

📝 Abstract

Scalable vector instruction sets such as Arm SVE enable vector-length-agnostic (VLA) execution, allowing a single implementation to adapt across hardware with different vector lengths. However, they complicate compiler code generation, as tiling and data layout decisions can no longer be fixed at compile time. We present an approach for enabling VLA code generation in an end-to-end ML compilation pipeline through vector-length-aware packed data layouts and corresponding compiler extensions. We integrate these mechanisms into MLIR/IREE and extend tiling, fusion, and vectorization to operate with scalable vector lengths. Evaluated on real-world ML workloads on Arm CPUs, our approach generates SVE code that is competitive with, and often outperforms, existing NEON-based code generation within IREE, achieving up to $1.45\times$ speedup. We also outperform PyTorch ecosystem frameworks, including ExecuTorch, TorchInductor, and eager execution, demonstrating the effectiveness of scalable vectorization in a production compiler setting. A simulator-based study further shows that the generated code scales with increasing SVE vector length on compute-bound workloads, supporting performance portability across hardware configurations.

Problem

Research questions and friction points this paper is trying to address.

vector-length-agnostic

scalable vectorization

compiler code generation

data layout

ML compilation

Innovation

Methods, ideas, or system contributions that make the work stand out.

vector-length-agnostic

scalable vectorization

packed layout

MLIR/IREE

SVE

🔎 Similar Papers

No similar papers found.