🤖 AI Summary
Optimized GPU kernels for ML workloads—such as convolution, GEMM, and attention—are increasingly generated via manual tuning, compiler autotuning, or LLM-based synthesis; however, existing approaches lack formal guarantees of functional equivalence between original and optimized variants. Method: This paper introduces VOLTA, the first formal equivalence verification framework for GPU kernels. VOLTA constructs precise semantic models of GPU kernels and leverages sound program-equivalence checking algorithms to ensure theoretical completeness and correctness. Contribution/Results: VOLTA supports mainstream deep learning and large language model computation patterns, enabling fully automated verification of functional consistency across diverse optimization sources. Experimental evaluation demonstrates that VOLTA efficiently detects multiple previously unknown semantic inconsistencies in state-of-the-art optimized kernels, with high reliability. By providing a rigorous formal foundation, VOLTA advances trustworthiness in low-level operator optimization for AI systems.
📝 Abstract
With the rapid progress of deep learning and large language models (LLMs), companies now spend enormous sums executing GPU kernels. These kernels have, therefore, become prime targets for aggressive optimization. Recent efforts increasingly leverage LLMs to generate GPU kernels, but make no formal guarantees about the generated kernels. We present the first equivalence checker for GPU kernels and use it to formally verify the correctness of machine learning (ML) kernels optimized by hand, by LLMs, and by compilers. We show that our equivalence checker is sound and, for a well-defined class of GPU kernels which includes the programs of interest, complete. Our implementation, VOLTA, can verify ML computations such as convolutions, matrix multiplications, and various attention mechanisms.