SynPerf: A Hybrid Analytical-ML Framework for GPU Performance Prediction

📅 2026-01-21

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

This work addresses the limitations of existing data-driven GPU performance models, which struggle to generalize across architectures and accurately model complex production-level kernels—hindering efficient hardware selection for large language model (LLM) inference. To overcome this, we propose SynPerf, the first approach that integrates analytical modeling with machine learning. SynPerf employs an analytical model to quantify kernel demands on GPU heterogeneous instruction pipelines and leverages a machine learning model to capture cross-pipeline interactions and resource dependencies, enabling high-fidelity performance prediction. Evaluated across 11 GPU architectures, SynPerf achieves kernel-level and end-to-end inference prediction errors as low as 6.1% and 8.5%, respectively—improving over state-of-the-art methods by 6.7× and 4.4×. Furthermore, it successfully guides the optimization of MoE Triton kernels, yielding up to a 1.7× speedup.

Technology Category

Application Category

📝 Abstract

The rapid expansion of Transformer-based large language models has dramatically increased the need for high-performance GPUs. As a result, there is growing demand for fast, accurate, and widely generalizable GPU performance models to support next-generation hardware selection and system-level exploration. However, current data-driven methods are limited, exhibiting poor generalization across hardware and inadequate modeling of complex production-level kernels common in modern inference stacks. To address these issues, we present SyncPerf, a unified GPU modeling framework. This approach first employs an analytical model to quantify a given kernel's demands on the GPU's heterogeneous instruction pipelines. These analytical features are then fed into a machine learning (ML) model to capture complex cross-pipeline interactions and resource dependencies, enabling high-fidelity performance prediction. Our evaluation across 11 GPU types from four generations of major architectures on two widely-used serving systems demonstrates that SyncPerf delivers high fidelity and strong generalizability. It achieves accurate predictions, with only 6.1% average error at the kernel level and 8.5% for end-to-end inference -- reducing the error of state-of-the-art methods by 6.7x and 4.4x, respectively. We also demonstrate SynPerf's value"beyond simulation"by utilizing its performance ceiling to diagnose implementation shortcomings and guide the optimization of a production fused MoE Triton kernel, achieving up to 1.7x speedup.

Problem

Research questions and friction points this paper is trying to address.

GPU performance prediction

generalization

production-level kernels

hardware selection

inference systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

hybrid analytical-ML modeling

GPU performance prediction

kernel-level performance modeling