🤖 AI Summary
Tensor compilers lack unified, realistic, and multi-framework computational graph benchmarks. Method: We introduce GraphNet, a large-scale standardized dataset comprising 2.7K real-world computational graphs from six task categories and multiple deep learning frameworks (e.g., PyTorch, TensorFlow). We propose two novel evaluation metrics—Speedup Score (S(t)) and Error-aware Speedup Score (ES(t))—that jointly quantify execution speedup, numerical correctness, and error sensitivity for the first time. We further design a cross-framework graph extraction methodology and an automated evaluation toolchain supporting end-to-end performance assessment of mainstream compilers including CINN and TorchInductor. Contribution/Results: Empirical validation on CV and NLP tasks demonstrates GraphNet’s effectiveness in exposing optimization bottlenecks across diverse graph structures. All code, data, and tools are publicly released.
📝 Abstract
We introduce GraphNet, a dataset of 2.7K real-world deep learning computational graphs with rich metadata, spanning six major task categories across multiple deep learning frameworks. To evaluate tensor compiler performance on these samples, we propose the benchmark metric Speedup Score S(t), which jointly considers runtime speedup and execution correctness under tunable tolerance levels, offering a reliable measure of general optimization capability. Furthermore, we extend S(t) to the Error-aware Speedup Score ES(t), which incorporates error information and helps compiler developers identify key performance bottlenecks. In this report, we benchmark the default tensor compilers, CINN for PaddlePaddle and TorchInductor for PyTorch, on computer vision (CV) and natural language processing (NLP) samples to demonstrate the practicality of GraphNet. The full construction pipeline with graph extraction and compiler evaluation tools is available at https://github.com/PaddlePaddle/GraphNet .