🤖 AI Summary
Current large language models (LLMs) exhibit limited effectiveness in performance-critical tasks such as GPU kernel optimization, primarily due to their inability to perceive the dynamic interplay between hardware environments and code execution behavior. To address this, we propose an environment-aware reasoning framework that enables LLMs to actively invoke GPU profiling tools (e.g., Nsight, nvprof) during inference, thereby acquiring real-time runtime metrics and dynamically adapting their reasoning paths accordingly. Our approach integrates chain-of-thought prompting, tool-augmented fine-tuning, and feedback-driven training. Experimental evaluation on real GPU workloads demonstrates substantial improvements: a +32.7% increase in optimization recommendation accuracy and an average 1.89× speedup in kernel execution time. To our knowledge, this is the first work enabling LLMs to achieve closed-loop perception and responsive adaptation to hardware-level performance bottlenecks.
📝 Abstract
Language models are now prevalent in software engineering with many developers using them to automate tasks and accelerate their development. While language models have been tremendous at accomplishing complex software engineering tasks, there are still many areas where they fail to deliver desirable results, for instance code performance related tasks. Tasks like optimization depend on many complex data from the environment, hardware, etc. that are not directly represented in source code. Recent efforts have seen large improvements in general code modeling tasks using chain-of-thought style reasoning, but these models still fail to comprehend how the environment interacts with code performance. In this paper we propose a methodology to train language models that can interact with performance tools during their reasoning process. We then demonstrate how this methodology can be used to train a state-of-the-art GPU kernel optimization model.