AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs

📅 2025-07-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Developing deep learning kernel functions faces challenges in hardware-specific adaptation and high manual tuning costs. Method: This paper introduces reinforcement learning (RL) to Triton programming automation for the first time, proposing a staged reward mechanism and Group Relative Policy Optimization (GRPO) to jointly optimize supervised fine-tuning (SFT) and RL; it integrates rule-based rewards with real execution feedback to construct a high-quality training data pipeline. Contribution/Results: Evaluated across five benchmarks in TritonBench and KernelBench, our 8B model achieves performance on par with Claude-4-Sonnet and DeepSeek-R1-0528. The approach significantly improves both the efficiency and quality of automated kernel generation while substantially reducing reliance on expert domain knowledge.

Technology Category

Application Category

📝 Abstract
Kernel development in deep learning requires optimizing computational units across hardware while balancing memory management, parallelism, and hardware-specific optimizations through extensive empirical tuning. Although domain-specific languages like Triton simplify GPU programming by abstracting low-level details, developers must still manually tune critical parameters such as tile sizes and memory access patterns through iterative experimentation, creating substantial barriers to optimal performance and wider adoption. In this work, we introduce AutoTriton, the first model dedicated to Triton programming powered by reinforcement learning (RL). AutoTriton performs supervised fine-tuning (SFT) to be equipped with essential Triton programming expertise using a high-quality data gathering pipeline, and conducts RL with Group Relative Policy Optimization (GRPO) algorithm, combining a rule-based reward and an execution-based reward to further improve Triton programming ability, sequentially. Experiments across five evaluation channels of TritonBench and KernelBench illustrate that our 8B model AutoTriton achieves performance comparable to mainstream large models, including Claude-4-Sonnet and DeepSeek-R1-0528. Further experimental analysis demonstrates the crucial role of each module within AutoTriton, including the SFT stage, the RL stage, and the reward design strategy. These findings underscore the promise of RL for automatically generating high-performance kernels, and since high-performance kernels are core components of AI systems, this breakthrough establishes an important foundation for building more efficient AI systems. The model and code will be available at https://github.com/AI9Stars/AutoTriton.
Problem

Research questions and friction points this paper is trying to address.

Automates Triton GPU programming using reinforcement learning
Optimizes kernel parameters like tile sizes and memory access
Enhances AI system efficiency via high-performance kernel generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses reinforcement learning for Triton programming
Combines rule-based and execution-based rewards
Achieves performance comparable to large models
🔎 Similar Papers
No similar papers found.
S
Shangzhan Li
Tsinghua University
Zefan Wang
Zefan Wang
Tsinghua University
machine learning
Y
Ye He
Tsinghua University
Y
Yuxuan Li
Tsinghua University
Q
Qi Shi
Tsinghua University
J
Jianling Li
Tianjin University
Y
Yonggang Hu
OpenBMB
Wanxiang Che
Wanxiang Che
Professor of Harbin Institute of Technology
Natural Language Processing
X
Xu Han
Tsinghua University
Z
Zhiyuan Liu
Tsinghua University
Maosong Sun
Maosong Sun
Professor of Computer Science and Technology, Tsinghua University
Natural Language ProcessingArtificial IntelligenceSocial Computing