Exploring the Feasibility of End-to-End Large Language Model as a Compiler

📅 2025-06-30
🏛️ IEEE International Joint Conference on Neural Network
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the feasibility of the “Large Language Model as a Compiler” (LaaC) paradigm—i.e., whether LLMs can perform end-to-end, precise compilation from source code to target assembly. To this end, the authors introduce CompilerEval†, the first benchmark dataset specifically designed for compilation tasks, comprising multilingual source code paired with cross-platform (x86, ARM, RISC-V) assembly. They develop a dedicated evaluation framework and employ prompt engineering, chain-of-thought reasoning, and model scaling to systematically assess the source-code understanding and assembly-generation capabilities of leading open- and closed-source LLMs. Results show that current LLMs possess foundational compilation ability; targeted optimizations substantially improve assembly correctness and overall compilation success rates. This study provides the first systematic empirical validation of LaaC’s technical viability, proposes principled architectural guidelines and evolutionary pathways for compilation-oriented LLMs, and establishes a foundation for AI-native compiler research.

Technology Category

Application Category

📝 Abstract
In recent years, end-to-end Large Language Model (LLM) technology has shown substantial advantages across various domains. As critical system software and infrastructure, compilers are responsible for transforming source code into target code. While LLMs have been leveraged to assist in compiler development and maintenance, their potential as an end-to-end compiler remains largely unexplored. This paper explores the feasibility of LLM as a Compiler (LaaC) and its future directions. We designed the CompilerEval† dataset and framework specifically to evaluate the capabilities of mainstream LLMs in source code comprehension and assembly code generation. In the evaluation, we analyzed various errors, explored multiple methods to improve LLM-generated code, and evaluated cross-platform compilation capabilities. Experimental results demonstrate that LLMs exhibit basic capabilities as compilers but currently achieve low compilation success rates. By optimizing prompts, scaling up the model, and incorporating reasoning methods, the quality of assembly code generated by LLMs can be significantly enhanced. Based on these findings, we maintain an optimistic outlook for LaaC and propose practical architectural designs and future research directions. We believe that with targeted training, knowledge-rich prompts, and specialized infrastructure, LaaC has the potential to generate high-quality assembly code and drive a paradigm shift in the field of compilation.
Problem

Research questions and friction points this paper is trying to address.

Exploring LLM feasibility as end-to-end compiler for code transformation
Evaluating LLM capabilities in source code comprehension and assembly generation
Investigating methods to improve LLM-generated assembly code quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using LLMs as end-to-end compilers for code transformation
Creating CompilerEval dataset to assess LLM compilation capabilities
Enhancing assembly code via prompt optimization and reasoning methods
🔎 Similar Papers
No similar papers found.
Hongbin Zhang
Hongbin Zhang
Institute of Materials Science, TU Darmstadt
DFT and beyondinverse design
S
Shihao Gao
Institute of Software, Chinese Academy of Sciences, Beijing, China
Y
Yang Liu
Institute of Software, Chinese Academy of Sciences, Beijing, China
M
Mingjie Xing
Institute of Software, Chinese Academy of Sciences, Beijing, China
Yanjun Wu
Yanjun Wu
Institute of Software, Chinese Academy of Sciences
Computer Science
C
Chen Zhao
Institute of Software, Chinese Academy of Sciences, Beijing, China