Integrating Performance Tools in Model Reasoning for GPU Kernel Optimization

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Current large language models (LLMs) exhibit limited effectiveness in performance-critical tasks such as GPU kernel optimization, primarily due to their inability to perceive the dynamic interplay between hardware environments and code execution behavior. To address this, we propose an environment-aware reasoning framework that enables LLMs to actively invoke GPU profiling tools (e.g., Nsight, nvprof) during inference, thereby acquiring real-time runtime metrics and dynamically adapting their reasoning paths accordingly. Our approach integrates chain-of-thought prompting, tool-augmented fine-tuning, and feedback-driven training. Experimental evaluation on real GPU workloads demonstrates substantial improvements: a +32.7% increase in optimization recommendation accuracy and an average 1.89× speedup in kernel execution time. To our knowledge, this is the first work enabling LLMs to achieve closed-loop perception and responsive adaptation to hardware-level performance bottlenecks.

Technology Category

Application Category

📝 Abstract

Language models are now prevalent in software engineering with many developers using them to automate tasks and accelerate their development. While language models have been tremendous at accomplishing complex software engineering tasks, there are still many areas where they fail to deliver desirable results, for instance code performance related tasks. Tasks like optimization depend on many complex data from the environment, hardware, etc. that are not directly represented in source code. Recent efforts have seen large improvements in general code modeling tasks using chain-of-thought style reasoning, but these models still fail to comprehend how the environment interacts with code performance. In this paper we propose a methodology to train language models that can interact with performance tools during their reasoning process. We then demonstrate how this methodology can be used to train a state-of-the-art GPU kernel optimization model.

Problem

Research questions and friction points this paper is trying to address.

Improving language models for GPU kernel optimization tasks

Integrating performance tools into model reasoning processes

Addressing environmental factors in code performance comprehension

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrating performance tools in model reasoning

Training models to interact with performance tools

Using methodology for GPU kernel optimization

🔎 Similar Papers

No similar papers found.