QiMeng-NeuComBack: Self-Evolving Translation from IR to Assembly Code

📅 2025-11-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Neural compilation faces two key challenges: the absence of dedicated benchmarks and reliable evaluation methodologies for IR-to-assembly generation, and the limited functional correctness and performance of large language model–generated assembly code. To address these, this paper introduces NeuComBack—the first cross-architecture (x86_64/aarch64) benchmark dataset specifically designed for IR-to-assembly translation. We further propose a self-evolving prompt optimization framework that dynamically refines prompts via program debugging feedback extraction, self-debugging trajectory mining, and multi-round iterative prompt updates. Experimental results demonstrate that our approach improves functional correctness from 44% to 64% on x86_64 and from 36% to 58% on aarch64. Moreover, 87.5% of the correctly generated assembly programs outperform Clang -O3 in execution performance. NeuComBack thus establishes a foundational resource and methodology for advancing neural compilation research.

Technology Category

Application Category

📝 Abstract
Compilers, while essential, are notoriously complex systems that demand prohibitively expensive human expertise to develop and maintain. The recent advancements in Large Language Models (LLMs) offer a compelling new paradigm: Neural Compilation, which could potentially simplify compiler development for new architectures and facilitate the discovery of innovative optimization techniques. However, several critical obstacles impede its practical adoption. Firstly, a significant lack of dedicated benchmarks and robust evaluation methodologies hinders objective assessment and tracking of progress in the field. Secondly, systematically enhancing the reliability and performance of LLM-generated assembly remains a critical challenge. Addressing these challenges, this paper introduces NeuComBack, a novel benchmark dataset specifically designed for IR-to-assembly compilation. Leveraging this dataset, we first define a foundational Neural Compilation workflow and conduct a comprehensive evaluation of the capabilities of recent frontier LLMs on Neural Compilation, establishing new performance baselines. We further propose a self-evolving prompt optimization method that enables LLMs to iteratively evolve their internal prompt strategies by extracting insights from prior self-debugging traces, thereby enhancing their neural compilation capabilities. Experiments demonstrate that our method significantly improves both the functional correctness and the performance of LLM-generated assembly code. Compared to baseline prompts, the functional correctness rates improved from 44% to 64% on x86_64 and from 36% to 58% on aarch64, respectively. More significantly, among the 16 correctly generated x86_64 programs using our method, 14 (87.5%) surpassed clang-O3 performance.
Problem

Research questions and friction points this paper is trying to address.

Addressing the lack of benchmarks for neural compilation evaluation
Enhancing reliability and performance of LLM-generated assembly code
Developing self-evolving prompts to optimize IR-to-assembly translation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes a benchmark dataset for IR-to-assembly compilation
Introduces self-evolving prompt optimization using debugging traces
Enhances functional correctness and performance of generated code
🔎 Similar Papers
No similar papers found.
H
Hainan Fang
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Yuanbo Wen
Yuanbo Wen
Institute of Computing Technology, Chinese Academy of Sciences
Machine Learning System
Jun Bi
Jun Bi
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Y
Yihan Wang
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
T
Tonghui He
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Yanlin Tang
Yanlin Tang
Assistant Professor at Department of Mathematics, Tongji University
Quantile regressionHypothesis testing
D
Di Huang
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Jiaming Guo
Jiaming Guo
Institute of Computing Technology, Chinese Academy of Sciences
Artificial intelligenceReinforcement Learning
R
Rui Zhang
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Q
Qi Guo
State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Yunji Chen
Yunji Chen
Institute of Computing Technology, Chinese Academy of Sciences
processor architecturemicroarchitecturemachine learning