VeriLocc: End-to-End Cross-Architecture Register Allocation via LLM

๐Ÿ“… 2025-06-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Traditional register allocation for GPUs relies heavily on hand-crafted heuristics, incurring high adaptation costs across rapidly evolving GPU architectures. Method: This paper introduces the first large language model (LLM)-based approach to register allocation, proposing an end-to-end, cross-architecture automation framework. It leverages normalized MIR (Machine Intermediate Representation) as input, integrates static analysis with an SMT-solver-guided regeneration mechanism to ensure formal correctness of generated allocations. Contribution/Results: The method eliminates dependence on manual tuning and generalizes across GPU architectures. Experiments on representative kernelsโ€”GEMM and multi-head attention (MHA)โ€”show single-shot allocation accuracy of 85โ€“99% and pass@100 approaching 100%. Empirical runtime evaluation demonstrates >10% performance improvement over rocBLAS.

Technology Category

Application Category

๐Ÿ“ Abstract
Modern GPUs evolve rapidly, yet production compilers still rely on hand-crafted register allocation heuristics that require substantial re-tuning for each hardware generation. We introduce VeriLocc, a framework that combines large language models (LLMs) with formal compiler techniques to enable generalizable and verifiable register allocation across GPU architectures. VeriLocc fine-tunes an LLM to translate intermediate representations (MIRs) into target-specific register assignments, aided by static analysis for cross-architecture normalization and generalization and a verifier-guided regeneration loop to ensure correctness. Evaluated on matrix multiplication (GEMM) and multi-head attention (MHA), VeriLocc achieves 85-99% single-shot accuracy and near-100% pass@100. Case study shows that VeriLocc discovers more performant assignments than expert-tuned libraries, outperforming rocBLAS by over 10% in runtime.
Problem

Research questions and friction points this paper is trying to address.

Automating GPU register allocation across architectures
Replacing manual heuristics with LLM-based generalization
Ensuring correctness via verifiable compiler techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM fine-tuning for register allocation
Cross-architecture normalization via static analysis
Verifier-guided regeneration ensures correctness
๐Ÿ”Ž Similar Papers
No similar papers found.