PerfDojo: Automated ML Library Generation for Heterogeneous Architectures

📅 2025-11-05

🏛️ Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Optimizing ML model kernels for heterogeneous hardware (x86/ARM/RISC-V/GPUs) remains challenging due to poor portability, high manual effort, and limitations of existing automated approaches—which rely heavily on hardware-specific heuristics and opaque intermediate representations. Method: This paper introduces PerfDojo, a framework that synergistically integrates large language models (LLMs) and reinforcement learning (RL) to automatically search for high-performance kernels within a mathematically formalized, semantically transparent code representation space—requiring no hardware-specific priors or low-level architectural knowledge. It natively supports modern model features including sparsity and quantization. Contribution/Results: Experiments across diverse architectures demonstrate that PerfDojo consistently outperforms state-of-the-art libraries (e.g., OpenBLAS, cuBLAS, TVM), generating kernels that achieve superior performance, cross-platform portability, and human-interpretable source code.

Technology Category

Application Category

📝 Abstract

The increasing complexity of machine learning models and the proliferation of diverse hardware architectures (CPUs, GPUs, accelerators) make achieving optimal performance a significant challenge. Heterogeneity in instruction sets, specialized kernel requirements for different data types and model features (e.g., sparsity, quantization), and architecture-specific optimizations complicate performance tuning. Manual optimization is resource-intensive, while existing automatic approaches often rely on complex hardware-specific heuristics and uninterpretable intermediate representations, hindering performance portability. We introduce PerfLLM, a novel automatic optimization methodology leveraging Large Language Models (LLMs) and Reinforcement Learning (RL). Central to this is PerfDojo, an environment framing optimization as an RL game using a human-readable, mathematically-inspired code representation that guarantees semantic validity through transformations. This allows effective optimization without prior hardware knowledge, facilitating both human analysis and RL agent training. We demonstrate PerfLLM's ability to achieve significant performance gains across diverse CPU (x86, Arm, RISC-V) and GPU architectures.

Problem

Research questions and friction points this paper is trying to address.

Optimizing ML performance across diverse hardware architectures automatically

Overcoming manual tuning limitations and uninterpretable automated approaches

Generating efficient ML libraries without requiring hardware-specific knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging LLMs and RL for automatic optimization

Using human-readable code representation for semantic validity

Achieving performance gains across diverse CPU and GPU architectures

🔎 Similar Papers

No similar papers found.