KernelSkill: A Multi-Agent Framework for GPU Kernel Optimization

πŸ“… 2026-03-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the inefficiency and lack of interpretability in existing large language model–based GPU kernel optimization methods, which rely on implicit heuristics. To overcome these limitations, the authors propose KernelSkill, a knowledge-driven, trajectory-aware multi-agent optimization framework featuring a novel dual-level memory architecture. The long-term memory stores reusable expert optimization skills, while the short-term memory prevents redundant search efforts, thereby transforming implicit heuristics into explicit, structured optimization knowledge. Experimental results demonstrate that KernelSkill achieves 100% optimization success across KernelBench Levels 1–3, delivering average speedups of 5.44Γ—, 2.82Γ—, and 1.92Γ— over Torch Eager, significantly outperforming current baseline approaches.

Technology Category

Application Category

πŸ“ Abstract
Improving GPU kernel efficiency is crucial for advancing AI systems. Recent work has explored leveraging large language models (LLMs) for GPU kernel generation and optimization. However, existing LLM-based kernel optimization pipelines typically rely on opaque, implicitly learned heuristics within the LLMs to determine optimization strategies. This leads to inefficient trial-and-error and weakly interpretable optimizations. Our key insight is to replace implicit heuristics with expert optimization skills that are knowledge-driven and aware of task trajectories. Specifically, we present KernelSkill, a multi-agent framework with a dual-level memory architecture. KernelSkill operates by coordinating agents with long-term memory of reusable expert skills and short-term memory to prevent repetitive backtracking. On KernelBench Levels 1-3, KernelSkill achieves a 100% success rate and average speedups of 5.44x, 2.82x, and 1.92x over Torch Eager on Levels 1, 2, and 3, respectively, outperforming prior baselines. Code is available at https://github.com/0satan0/KernelMem/.
Problem

Research questions and friction points this paper is trying to address.

GPU kernel optimization
large language models
heuristics
interpretability
efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent framework
GPU kernel optimization
expert skills
dual-level memory
knowledge-driven optimization
πŸ”Ž Similar Papers
No similar papers found.
Q
Qitong Sun
School of Computer Science and Engineering, Beihang University, China; Zhejiang Lab, China
J
Jun Han
School of Computer Science and Engineering, Beihang University, China
Tianlin Li
Tianlin Li
Nanyang Technological University
AI4SESE4AITrustworthy AI
Zhe Tang
Zhe Tang
University of Liverpool
WSNIoThybrid network
S
Sheng Chen
Zhejiang Lab, China
Fei Yang
Fei Yang
Zhejiang Lab
Theoretical Computer ScienceArtificial Intelligence
A
Aishan Liu
School of Computer Science and Engineering, Beihang University, China
X
Xianglong Liu
School of Computer Science and Engineering, Beihang University, China
Yang Liu
Yang Liu
Nanyang Technological University
AgentSoftware EngineeringCyber SecurityTrustworthy AISoftware Security