GCL-Sampler: Discovering Kernel Similarity for Sampled GPU Simulation via Graph Contrastive Learning

📅 2026-02-28

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the significant slowdown of GPU architecture simulation compared to native execution and the limitations of existing sampling techniques that rely on handcrafted features, which struggle to balance accuracy and efficiency. To overcome these challenges, the study introduces graph contrastive learning for GPU workload sampling—a novel approach that constructs trace graphs capturing instruction sequences and data dependencies, and employs a relational graph convolutional network to automatically uncover high-dimensional semantic and structural similarities among kernels. This method transcends the representational constraints of traditional handcrafted features. Extensive benchmark evaluations demonstrate that the proposed technique achieves an average speedup of 258.94× with only 0.37% error, substantially outperforming state-of-the-art methods such as PKA, Sieve, and STEM+ROOT.

Technology Category

Application Category

📝 Abstract

GPU architectural simulation is orders of magnitude slower than native execution, necessitating workload sampling for practical speedups. Existing methods rely on hand-crafted features with limited expressiveness, yielding either aggressive sampling with high errors or conservative sampling with constrained speedups. To address these issues, we propose GCL-Sampler, a sampling framework that leverages Relational Graph Convolutional Networks with contrastive learning to automatically discover high-dimensional kernel similarities from trace graphs. By encoding instruction sequences and data dependencies into graph embeddings, GCL-Sampler captures rich structural and semantic properties of program execution, enabling both high fidelity and substantial speedup. Evaluations on extensive benchmarks show that GCL-Sampler achieves 258.94x average speedup against full workload with 0.37% error, outperforming state-of-the-art methods, PKA (129.23x, 20.90%), Sieve (94.90x, 4.10%) and STEM+ROOT (56.57x, 0.38%).

Problem

Research questions and friction points this paper is trying to address.

GPU simulation

workload sampling

kernel similarity

sampling error

simulation speedup

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph Contrastive Learning

GPU Simulation

Workload Sampling