🤖 AI Summary
Optimizing tensor programs across GPU’s hierarchical compute architecture—kernels, thread blocks, and threads—remains challenging due to fragmented optimization scopes and insufficient cross-level coordination. To address this, we propose Mirage, the first multi-level super-optimizer explicitly designed for this hierarchy. Its core contributions are: (1) a unified intermediate representation, μGraph, enabling joint modeling of algebraic transformations, scheduling optimizations, and custom CUDA kernel generation; (2) abstraction-guided pruning and probabilistic equivalence verification, ensuring correctness while drastically improving search efficiency; and (3) end-to-end exploration of the multi-level scheduling space with automatic code generation. Evaluation on mainstream DNN models shows Mirage achieves 1.1–2.9× speedup over state-of-the-art optimizers including TVM and Ansor. The open-source implementation is publicly available.
📝 Abstract
We introduce Mirage, the first multi-level superoptimizer for tensor programs. A key idea in Mirage is $mu$Graphs, a uniform representation of tensor programs at the kernel, thread block, and thread levels of the GPU compute hierarchy. $mu$Graphs enable Mirage to discover novel optimizations that combine algebraic transformations, schedule transformations, and generation of new custom kernels. To navigate the large search space, Mirage introduces a pruning technique based on abstraction that significantly reduces the search space and provides a certain optimality guarantee. To ensure that the optimized $mu$Graph is equivalent to the input program, Mirage introduces a probabilistic equivalence verification procedure with strong theoretical guarantees. Our evaluation shows that Mirage outperforms existing approaches by 1.1-2.9$ imes$ even for DNNs that are widely used and heavily optimized. Mirage is publicly available at https://github.com/mirage-project/mirage.