C2RustXW: Program-Structure-Aware C-to-Rust Translation via Program Analysis and LLM

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing tools for migrating C code to Rust often produce verbose, non-idiomatic output that retains unsafe semantics, hindering maintainability and extensibility. This work proposes a novel approach that integrates program analysis with large language models (LLMs) by extracting multi-level program structures—including control flow and data flow—to construct structured prompts that guide dependency-aware translation and a multi-stage repair pipeline. For the first time, deep program structural information is incorporated into LLM prompting, ensuring both syntactic and semantic correctness. Evaluated on CodeNet and GitHub datasets, the method achieves 100% and 97.78% syntactic correctness, respectively, reduces code size by up to 43.70%, lowers unsafe usage to 5.75%, and attains a project-level semantic correctness rate of 78.87%.
📝 Abstract
The growing adoption of Rust for its memory safety and performance has increased the demand for effective migration of legacy C codebases. However, existing rule-based translators (e.g., \ctorust) often generate verbose, non-idiomatic code that preserves unsafe C semantics, limiting readability, maintainability, and practical adoption. Moreover, manual post-processing of such outputs is labor-intensive and rarely yields high-quality Rust code, posing a significant barrier to large-scale migration. To address these limitations, we present \tool, a program-structure-aware C-to-Rust translation approach that integrates program analysis with Large Language Models (LLMs). \tool extracts the multi-level program structure, including global symbols, function dependencies, and control- and data-flow information, and encodes these as structured textual representations injected into LLM prompts to guide translation and repair. Based on this design, \tool performs dependency-aware translation and adopts a multi-stage repair pipeline that combines rule-based and structure-guided LLM-based techniques to ensure syntactic correctness. For semantic correctness, \tool further integrates execution-based validation with structure-guided reasoning to localize and repair behavioral inconsistencies. Experimental results show that \tool achieves 100\% syntactic correctness on CodeNet and 97.78\% on GitHub, while significantly reducing code size (up to 43.70\%) and unsafe usage (to 5.75\%). At the project level, \tool achieves perfect syntactic correctness and an average semantic correctness of 78.87\%, demonstrating its effectiveness for practical and scalable C-to-Rust migration.
Problem

Research questions and friction points this paper is trying to address.

C-to-Rust translation
legacy code migration
memory safety
code maintainability
unsafe semantics
Innovation

Methods, ideas, or system contributions that make the work stand out.

program-structure-aware translation
C-to-Rust migration
large language models
program analysis
semantic correctness
🔎 Similar Papers
No similar papers found.
Y
Yanyan Yan
State Key Laboratory for Novel Software Technology, Nanjing University
Yang Feng
Yang Feng
Nanjing University
Software Engineering
Jiangshan Liu
Jiangshan Liu
Southern University of Science and Technology
roboticsHRIMR
D
Di Liu
Jiangsu Police Institute
Zixi Liu
Zixi Liu
Meta FAIR
Robotic GraspingTactile Sensing
H
Hao Teng
State Key Laboratory for Novel Software Technology, Nanjing University
Baowen Xu
Baowen Xu
Nanjing University
SoftwareProgramming Languages