Relax: Composable Abstractions for End-to-End Dynamic Machine Learning

📅 2023-11-01

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 0

career value

149K/year

🤖 AI Summary

To address compilation optimization challenges for dynamically shaped models—particularly large language models (LLMs)—on heterogeneous backends, this paper proposes the first unified compilation abstraction enabling cross-level fusion across computational graphs, loop-level tensor programs, and external library calls. Our method introduces: (1) first-class symbolic shape annotations for global dynamic shape tracking, and (2) a multi-level intermediate representation (IR) fusion framework that integrates symbolic shape inference with shape-aware optimization scheduling. Evaluated on multiple GPU architectures, our approach achieves state-of-the-art performance. Moreover, it enables, for the first time, efficient end-to-end deployment of mainstream LLMs on resource-constrained platforms—including smartphones, embedded devices, and web browsers—without model retraining or structural modification. This significantly broadens the practical applicability of dynamically shaped models in real-world edge and client-side scenarios.

📝 Abstract

Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven the demand for their universal deployment across a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads. Relax introduces a cross-level abstraction that encapsulates computational graphs, loop-level tensor programs, and external library calls in a single representation. Relax also introduces first-class symbolic shape annotations to track dynamic shape computations globally across the program, enabling dynamic shape-aware cross-level optimizations. We build an end-to-end compilation framework using the proposed approach to optimize dynamic shape models. Experimental results on LLMs show that Relax delivers performance competitive with state-of-the-art systems across various GPUs and enables deployment of emerging models to a broader set of emerging environments, including mobile phones, embedded devices, and web browsers.

Problem

Research questions and friction points this paper is trying to address.

Optimizing dynamic machine learning workloads

Cross-level abstraction for computational graphs

Deploying models across diverse backend environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic shape-aware optimizations

Cross-level abstraction representation

End-to-end compilation framework

🔎 Similar Papers

No similar papers found.