Integrating Rules and Semantics for LLM-Based C-to-Rust Translation

📅 2025-08-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the core challenge in automatically translating C code to Rust: simultaneously ensuring memory safety and semantic consistency. To this end, we propose IRENE, a novel framework featuring a three-module collaborative mechanism: (1) rule-augmented retrieval, integrating Rust memory-safety constraints derived from static analysis; (2) structured summarization guidance, explicitly modeling code semantics via hierarchical abstraction; and (3) error-driven translation, iteratively refining outputs using compiler feedback. Methodologically, IRENE injects formal safety rules into large language model (LLM)-based sequence-to-sequence translation, jointly optimizing syntactic correctness and semantic fidelity. Evaluated on xCodeEval and Huawei HW-Bench benchmarks, IRENE significantly improves translation accuracy and memory safety across eight mainstream LLMs, reducing unsafe code generation by 42.6% on average. It establishes a scalable, safety-oriented paradigm for cross-language code migration.

Technology Category

Application Category

📝 Abstract
Automated translation of legacy C code into Rust aims to ensure memory safety while reducing the burden of manual migration. Early approaches in code translation rely on static rule-based methods, but they suffer from limited coverage due to dependence on predefined rule patterns. Recent works regard the task as a sequence-to-sequence problem by leveraging large language models (LLMs). Although these LLM-based methods are capable of reducing unsafe code blocks, the translated code often exhibits issues in following Rust rules and maintaining semantic consistency. On one hand, existing methods adopt a direct prompting strategy to translate the C code, which struggles to accommodate the syntactic rules between C and Rust. On the other hand, this strategy makes it difficult for LLMs to accurately capture the semantics of complex code. To address these challenges, we propose IRENE, an LLM-based framework that Integrates RulEs aNd sEmantics to enhance translation. IRENE consists of three modules: 1) a rule-augmented retrieval module that selects relevant translation examples based on rules generated from a static analyzer developed by us, thereby improving the handling of Rust rules; 2) a structured summarization module that produces a structured summary for guiding LLMs to enhance the semantic understanding of C code; 3) an error-driven translation module that leverages compiler diagnostics to iteratively refine translations. We evaluate IRENE on two datasets (xCodeEval, a public dataset, and HW-Bench, an industrial dataset provided by Huawei) and eight LLMs, focusing on translation accuracy and safety.
Problem

Research questions and friction points this paper is trying to address.

Ensuring memory safety in C-to-Rust translation
Overcoming limited coverage of rule-based methods
Improving semantic consistency in LLM-based translations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rule-augmented retrieval for Rust rule adherence
Structured summarization for semantic understanding
Error-driven iterative refinement using compiler diagnostics
🔎 Similar Papers
No similar papers found.
F
Feng Luo
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
K
Kexing Ji
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
C
Cuiyun Gao
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
Shuzheng Gao
Shuzheng Gao
The Chinese University of Hong Kong
Code IntelligenceSoftware EngineeringLarge Language Models
J
Jia Feng
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
K
Kui Liu
Huawei Software Engineering Application Technology Lab, China
X
Xin Xia
Zhejiang University, China
Michael R. Lyu
Michael R. Lyu
Professor of Computer Science & Engineering, The Chinese University of Hong Kong
software engineeringsoftware reliabilityfault tolerancemachine learningdistributed systems