AccelOpt: A Self-Improving LLM Agentic System for AI Accelerator Kernel Optimization

📅 2025-11-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Kernel optimization for emerging AI accelerators heavily relies on expert hardware knowledge, hindering automation and scalability. Method: This paper proposes a self-evolving optimization framework powered by large language model (LLM) agents. It constructs a “slow–fast kernel verification” experience memory bank and employs iterative generation–feedback loops to achieve end-to-end automatic tuning—without requiring domain-specific hardware expertise. The framework is trained and evaluated on the NKIBench benchmark. Results: On real LLM workloads, it achieves average peak throughput of 61% (+12 percentage points) and 59% (+14 percentage points) on Trainium 1 and Trainium 2, respectively—substantially outperforming baselines. Implemented with open-source LLMs, its cost is only 1/26 that of Claude Sonnet 4. Contribution: This work pioneers the integration of LLM agents, self-evolving memory, and accelerator kernel optimization, establishing a novel paradigm for autonomous AI system optimization.

Technology Category

Application Category

📝 Abstract
We present AccelOpt, a self-improving large language model (LLM) agentic system that autonomously optimizes kernels for emerging AI acclerators, eliminating the need for expert-provided hardware-specific optimization knowledge. AccelOpt explores the kernel optimization space through iterative generation, informed by an optimization memory that curates experiences and insights from previously encountered slow-fast kernel pairs. We build NKIBench, a new benchmark suite of AWS Trainium accelerator kernels with varying complexity extracted from real-world LLM workloads to evaluate the effectiveness of AccelOpt. Our evaluation confirms that AccelOpt's capability improves over time, boosting the average percentage of peak throughput from $49%$ to $61%$ on Trainium 1 and from $45%$ to $59%$ on Trainium 2 for NKIBench kernels. Moreover, AccelOpt is highly cost-effective: using open-source models, it matches the kernel improvements of Claude Sonnet 4 while being $26 imes$ cheaper.
Problem

Research questions and friction points this paper is trying to address.

Autonomous optimization of AI accelerator kernels without expert knowledge
Self-improving system using memory from kernel optimization experiences
Cost-effective kernel optimization matching expensive models' performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-improving LLM system autonomously optimizes AI accelerator kernels
Explores optimization space using memory of kernel pairs
Achieves cost-effective performance matching expensive models
🔎 Similar Papers
No similar papers found.