🤖 AI Summary
Testing optimization phases in deep learning compilers faces challenges due to test cases lacking optimization awareness and failing to trigger deep optimization paths. This paper proposes OATest—the first optimization-aware test generation method targeting compiler optimization stages. It enhances path exploration by integrating optimization patterns with seed computational graphs; introduces an edge-reuse strategy to preserve pattern context consistency; and incorporates auxiliary layers to repair broken constraints. Furthermore, OATest supports differential testing across TVM and ONNX Runtime backends to detect semantic inconsistencies. Experimental evaluation demonstrates that OATest significantly improves code coverage, discovers 58 previously unknown vulnerabilities—36 of which have been confirmed or fixed by developers—and substantially strengthens reliability and security verification for compiler optimization phases.
📝 Abstract
Deep Learning (DL) compilers have been widely utilized to optimize DL models for efficient deployment across various hardware. Due to their vital role in the DL ecosystem, ensuring their reliability and security is critical. However, existing approaches have limitations in testing optimization stages, which is the core functionality of DL compilers, due to the difficulty in generating optimization-aware tests. In this paper, we proposed OATest, a novel approach for synthesizing optimization-aware computational graphs. The approach combines patterns extracted from documented tests for optimization and incorporates them into seed computational graphs, enabling broader exploration of optimization paths. To guarantee the optimization-awareness of generated graphs, OATest introduces the edges reusing strategy to establish strong connections between patterns and contexts. Additionally, to solve the validity challenge for the generated graphs, OATest employs an auxiliary layers addition strategy to resolve broken constraints. Equipped with two distinct test oracles, OATest applies differential testing to evaluate the two widely used DL compilers (i.e., TVM and ONNXRuntime). Our experimental results show that OATest outperforms the state-of-the-art method by detecting more bugs and achieving higher code coverage in TVM and ONNXRutimes. Additionally, OATest uncovers 58 previously unknown bugs, 36 of which have been confirmed or fixed by developers.