🤖 AI Summary
To address compilation optimization challenges for dynamically shaped models—particularly large language models (LLMs)—on heterogeneous backends, this paper proposes the first unified compilation abstraction enabling cross-level fusion across computational graphs, loop-level tensor programs, and external library calls. Our method introduces: (1) first-class symbolic shape annotations for global dynamic shape tracking, and (2) a multi-level intermediate representation (IR) fusion framework that integrates symbolic shape inference with shape-aware optimization scheduling. Evaluated on multiple GPU architectures, our approach achieves state-of-the-art performance. Moreover, it enables, for the first time, efficient end-to-end deployment of mainstream LLMs on resource-constrained platforms—including smartphones, embedded devices, and web browsers—without model retraining or structural modification. This significantly broadens the practical applicability of dynamically shaped models in real-world edge and client-side scenarios.
📝 Abstract
Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven the demand for their universal deployment across a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads. Relax introduces a cross-level abstraction that encapsulates computational graphs, loop-level tensor programs, and external library calls in a single representation. Relax also introduces first-class symbolic shape annotations to track dynamic shape computations globally across the program, enabling dynamic shape-aware cross-level optimizations. We build an end-to-end compilation framework using the proposed approach to optimize dynamic shape models. Experimental results on LLMs show that Relax delivers performance competitive with state-of-the-art systems across various GPUs and enables deployment of emerging models to a broader set of emerging environments, including mobile phones, embedded devices, and web browsers.