🤖 AI Summary
This work investigates the feasibility of leveraging large language models (LLMs) for performance optimization at the assembly level. Addressing the lack of systematic exploration—particularly in low-level assembly optimization—of LLMs’ code transformation capabilities, we propose the first PPO-based reinforcement learning framework for LLM-driven assembly optimization. The framework fine-tunes LLMs to maximize execution speed relative to gcc -O3, while strictly preserving functional correctness. Key contributions include: (1) the first systematic application of RL to LLM-based assembly optimization; (2) the construction of the first assembly optimization benchmark comprising 8,072 real-world programs; and (3) state-of-the-art results on Qwen2.5-Coder-7B, achieving a 96.0% test pass rate and 1.47× average speedup—outperforming 20 baseline models, including Claude-3.5-Sonnet.
📝 Abstract
Large language models (LLMs) have demonstrated strong performance across a wide range of programming tasks, yet their potential for code optimization remains underexplored. This work investigates whether LLMs can optimize the performance of assembly code, where fine-grained control over execution enables improvements that are difficult to express in high-level languages. We present a reinforcement learning framework that trains LLMs using Proximal Policy Optimization (PPO), guided by a reward function that considers both functional correctness, validated through test cases, and execution performance relative to the industry-standard compiler gcc -O3. To support this study, we introduce a benchmark of 8,072 real-world programs. Our model, Qwen2.5-Coder-7B-PPO, achieves 96.0% test pass rates and an average speedup of 1.47x over the gcc -O3 baseline, outperforming all 20 other models evaluated, including Claude-3.7-sonnet. These results indicate that reinforcement learning can unlock the potential of LLMs to serve as effective optimizers for assembly code performance.