🤖 AI Summary
Existing compiler auto-tuning research lacks standardized black-box optimization benchmarks, hindering systematic evaluation of Bayesian optimization and related methods on structured search spaces—particularly those involving discrete, conditional, and permutation parameters, as well as multi-fidelity or multi-objective objectives.
Method: We introduce the first unified, containerized, and reproducible multidimensional benchmark suite specifically designed for compiler tuning, covering state-of-the-art compilers including TACO and RISE/ELEVATE. Our framework systematically models their parameter structural complexity and integrates native support for constraint handling, multi-objective optimization, and multi-fidelity evaluation.
Contribution/Results: The benchmark enables fair, apples-to-apples comparison of SOTA optimization algorithms on real-world compiler tasks. Empirical evaluation reveals fundamental performance boundaries of existing methods in structured spaces, significantly improving method comparability and experimental reproducibility in compiler auto-tuning research.
📝 Abstract
Bayesian optimization is a powerful method for automating tuning of compilers. The complex landscape of autotuning provides a myriad of rarely considered structural challenges for black-box optimizers, and the lack of standardized benchmarks has limited the study of Bayesian optimization within the domain. To address this, we present CATBench, a comprehensive benchmarking suite that captures the complexities of compiler autotuning, ranging from discrete, conditional, and permutation parameter types to known and unknown binary constraints, as well as both multi-fidelity and multi-objective evaluations. The benchmarks in CATBench span a range of machine learning-oriented computations, from tensor algebra to image processing and clustering, and uses state-of-the-art compilers, such as TACO and RISE/ELEVATE. CATBench offers a unified interface for evaluating Bayesian optimization algorithms, promoting reproducibility and innovation through an easy-to-use, fully containerized setup of both surrogate and real-world compiler optimization tasks. We validate CATBench on several state-of-the-art algorithms, revealing their strengths and weaknesses and demonstrating the suite's potential for advancing both Bayesian optimization and compiler autotuning research.