Project-Level C-to-Rust Translation via Synergistic Integration of Knowledge Graphs and Large Language Models

📅 2025-10-12

📈 Citations: 0

✨ Influential: 0

career value

139K/year

🤖 AI Summary

Existing LLM-based project-level C-to-Rust translation approaches lack global pointer semantics understanding, hindering memory safety guarantees. To address this, we propose the first knowledge graph—C-Rust KG—that jointly models pointer semantics and Rust’s ownership model. It explicitly encodes global pointer flows, borrowing relationships, and lifetime constraints, and integrates them into both code dependency graphs and LLM prompt engineering to form a semantics-enhanced cross-language translation framework. This enables LLMs to generate idiomatic, memory-safe Rust code consistently within full-project context. Experimental evaluation shows our method reduces unsafe code by 99.9% and improves functional correctness by 29.3% on average, significantly outperforming both rule-based systems and state-of-the-art LLM-only approaches.

Technology Category

Application Category

📝 Abstract

Translating C code into safe Rust is an effective way to ensure its memory safety. Compared to rule-based translation which produces Rust code that remains largely unsafe, LLM-based methods can generate more idiomatic and safer Rust code because LLMs have been trained on vast amount of human-written idiomatic code. Although promising, existing LLM-based methods still struggle with project-level C-to-Rust translation. They typically partition a C project into smaller units (eg{} functions) based on call graphs and translate them bottom-up to resolve program dependencies. However, this bottom-up, unit-by-unit paradigm often fails to translate pointers due to the lack of a global perspective on their usage. To address this problem, we propose a novel C-Rust Pointer Knowledge Graph (KG) that enriches a code-dependency graph with two types of pointer semantics: (i) pointer-usage information which record global behaviors such as points-to flows and map lower-level struct usage to higher-level units; and (ii) Rust-oriented annotations which encode ownership, mutability, nullability, and lifetime. Synthesizing the kg{} with LLMs, we further propose ourtool{}, which implements a project-level C-to-Rust translation technique. In ourtool{}, the kg{} provides LLMs with comprehensive pointer semantics from a global perspective, thus guiding LLMs towards generating safe and idiomatic Rust code from a given C project. Our experiments show that ourtool{} reduces unsafe usages in translated Rust by 99.9% compared to both rule-based translation and traditional LLM-based rewriting, while achieving an average 29.3% higher functional correctness than those fuzzing-enhanced LLM methods.

Problem

Research questions and friction points this paper is trying to address.

Enabling project-level C-to-Rust translation with global pointer semantics

Addressing unsafe pointer handling in existing LLM-based translation methods

Generating memory-safe idiomatic Rust code through knowledge graph integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates knowledge graphs with LLMs for translation

Enriches code graphs with pointer semantics globally

Guides LLMs to generate safe idiomatic Rust code

🔎 Similar Papers

Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models