Event Tensor: A Unified Abstraction for Compiling Dynamic Megakernel

📅 2026-04-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

203K/year
🤖 AI Summary
This work addresses the inefficiencies in large language model (LLM) inference on modern GPUs, which are hindered by kernel launch overhead and coarse-grained synchronization that impede effective fusion of computations with dynamic shapes and data dependencies. To overcome these limitations, the authors introduce Event Tensor—the first unified compiler abstraction tailored for dynamic megakernels—that explicitly models inter-tile dependencies and natively supports dynamism as a first-class feature. They further develop the Event Tensor Compiler (ETC), which integrates static and dynamic scheduling transformations to automatically generate high-performance persistent kernels. Evaluated on LLM inference tasks, this approach achieves state-of-the-art latency performance while substantially reducing system warm-up overhead.

Technology Category

Application Category

📝 Abstract
Modern GPU workloads, especially large language model (LLM) inference, suffer from kernel launch overheads and coarse synchronization that limit inter-kernel parallelism. Recent megakernel techniques fuse multiple operators into a single persistent kernel to eliminate launch gaps and expose inter-kernel parallelism, but struggle to handle dynamic shapes and data-dependent computation in real workloads. We present Event Tensor, a unified compiler abstraction for dynamic megakernels. Event Tensor encodes dependencies between tiled tasks, and enables first-class support for both shape and data-dependent dynamism. Built atop this abstraction, our Event Tensor Compiler (ETC) applies static and dynamic scheduling transformations to generate high-performance persistent kernels. Evaluations show that ETC achieves state-of-the-art LLM serving latency while significantly reducing system warmup overhead.
Problem

Research questions and friction points this paper is trying to address.

kernel launch overhead
inter-kernel parallelism
dynamic shapes
data-dependent computation
megakernel
Innovation

Methods, ideas, or system contributions that make the work stand out.

Event Tensor
dynamic megakernel
compiler abstraction
LLM inference
persistent kernel
🔎 Similar Papers
No similar papers found.
Hongyi Jin
Hongyi Jin
Carnegie Mellon University
machine learning systemcompilers
Bohan Hou
Bohan Hou
PhD of Computer Science, Carnegie Mellon University
Machine LearningSystems
Guanjie Wang
Guanjie Wang
Shanghai Lixin University of Accounting and Finance
Scientific ComputingUncertainty Quantification
Ruihang Lai
Ruihang Lai
Carnegie Mellon University
Machine Learning Systems
J
Jinqi Chen
NVIDIA
Zihao Ye
Zihao Ye
NVIDIA, University of Washington
CompilersMachine Learning Systems
Yaxing Cai
Yaxing Cai
Shanghai Jiao Tong University
Yixin Dong
Yixin Dong
Ph.D. Student, Carnegie Mellon University
Machine learningMachine Learning SystemLarge Language Model
Xinhao Cheng
Xinhao Cheng
CS PhD student at Carnegie Mellon University
Computer systemsMachine Learning Systems
Zhihao Zhang
Zhihao Zhang
CSD, Carnegie Mellon University
Deep LearningMLSys
Yilong Zhao
Yilong Zhao
Ph.D. student, UC Berkeley
Computer SystemMicroarchitectureMachine Learning System
Y
Yingyi Huang
NVIDIA
L
Lijie Yang
Princeton University
J
Jinchen Jiang
Tsinghua University
Gabriele Oliaro
Gabriele Oliaro
Carnegie Mellon University, Snowflake AI Research
Machine LearningDistributed SystemsParallel ComputingNetworking
J
Jianan Ji
Carnegie Mellon University
Xupeng Miao
Xupeng Miao
Purdue University
Machine Learning SystemsData Management
Vinod Grover
Vinod Grover
Sr Distinguished Engineer, NVIDIA Corporation
Programming LanguagesCompilersDeep Learning
Todd C. Mowry
Todd C. Mowry
Professor of Computer Science, Carnegie Mellon University
computer architecturecompilersoperating systemsparallel processingdatabase performance
Zhihao Jia
Zhihao Jia
Assistant Professor of Computer Science, Carnegie Mellon University
Computer SystemsMachine LearningDeep Neural Networks
Tianqi Chen
Tianqi Chen
Carnegie Mellon University
Machine LearningSystems