C2RustXW: Program-Structure-Aware C-to-Rust Translation via Program Analysis and LLM

📅 2026-03-30

📈 Citations: 0

✨ Influential: 0

career value

134K/year

🤖 AI Summary

Existing tools for migrating C code to Rust often produce verbose, non-idiomatic output that retains unsafe semantics, hindering maintainability and extensibility. This work proposes a novel approach that integrates program analysis with large language models (LLMs) by extracting multi-level program structures—including control flow and data flow—to construct structured prompts that guide dependency-aware translation and a multi-stage repair pipeline. For the first time, deep program structural information is incorporated into LLM prompting, ensuring both syntactic and semantic correctness. Evaluated on CodeNet and GitHub datasets, the method achieves 100% and 97.78% syntactic correctness, respectively, reduces code size by up to 43.70%, lowers unsafe usage to 5.75%, and attains a project-level semantic correctness rate of 78.87%.

Technology Category

Application Category

📝 Abstract

The growing adoption of Rust for its memory safety and performance has increased the demand for effective migration of legacy C codebases. However, existing rule-based translators (e.g., \ctorust) often generate verbose, non-idiomatic code that preserves unsafe C semantics, limiting readability, maintainability, and practical adoption. Moreover, manual post-processing of such outputs is labor-intensive and rarely yields high-quality Rust code, posing a significant barrier to large-scale migration. To address these limitations, we present \tool, a program-structure-aware C-to-Rust translation approach that integrates program analysis with Large Language Models (LLMs). \tool extracts the multi-level program structure, including global symbols, function dependencies, and control- and data-flow information, and encodes these as structured textual representations injected into LLM prompts to guide translation and repair. Based on this design, \tool performs dependency-aware translation and adopts a multi-stage repair pipeline that combines rule-based and structure-guided LLM-based techniques to ensure syntactic correctness. For semantic correctness, \tool further integrates execution-based validation with structure-guided reasoning to localize and repair behavioral inconsistencies. Experimental results show that \tool achieves 100\% syntactic correctness on CodeNet and 97.78\% on GitHub, while significantly reducing code size (up to 43.70\%) and unsafe usage (to 5.75\%). At the project level, \tool achieves perfect syntactic correctness and an average semantic correctness of 78.87\%, demonstrating its effectiveness for practical and scalable C-to-Rust migration.

Problem

Research questions and friction points this paper is trying to address.

C-to-Rust translation

legacy code migration

memory safety

code maintainability

unsafe semantics

Innovation

Methods, ideas, or system contributions that make the work stand out.

program-structure-aware translation

C-to-Rust migration

large language models