CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation

📅 2025-04-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing C-to-Rust translation lacks benchmarks that comprehensively assess memory safety and end-to-end functional correctness—hindering the modernization of legacy C code. This paper introduces CRUST-Bench, the first holistic benchmark for C→Safe Rust translation: it comprises 100 real-world C repositories, each accompanied by manually authored, memory-safe Rust interfaces (serving as formal specifications) and cross-file, dependency-aware end-to-end test suites. We establish the first repository-level evaluation paradigm for C-to-Safe Rust translation. Integrating an LLM-based translation evaluation framework with systematic error pattern analysis, our experiments reveal that even the state-of-the-art model (OpenAI o1) succeeds on only 15 out of 100 tasks in single-shot generation. The study uncovers fundamental limitations in current approaches—including inaccurate ownership inference, inadequate lifetime modeling, and insufficient handling of external dependencies—highlighting critical gaps in automated safe systems programming translation.

Technology Category

Application Category

📝 Abstract
C-to-Rust transpilation is essential for modernizing legacy C code while enhancing safety and interoperability with modern Rust ecosystems. However, no dataset currently exists for evaluating whether a system can transpile C into safe Rust that passes a set of test cases. We introduce CRUST-Bench, a dataset of 100 C repositories, each paired with manually-written interfaces in safe Rust as well as test cases that can be used to validate correctness of the transpilation. By considering entire repositories rather than isolated functions, CRUST-Bench captures the challenges of translating complex projects with dependencies across multiple files. The provided Rust interfaces provide explicit specifications that ensure adherence to idiomatic, memory-safe Rust patterns, while the accompanying test cases enforce functional correctness. We evaluate state-of-the-art large language models (LLMs) on this task and find that safe and idiomatic Rust generation is still a challenging problem for various state-of-the-art methods and techniques. We also provide insights into the errors LLMs usually make in transpiling code from C to safe Rust. The best performing model, OpenAI o1, is able to solve only 15 tasks in a single-shot setting. Improvements on CRUST-Bench would lead to improved transpilation systems that can reason about complex scenarios and help in migrating legacy codebases from C into languages like Rust that ensure memory safety. You can find the dataset and code at https://github.com/anirudhkhatry/CRUST-bench.
Problem

Research questions and friction points this paper is trying to address.

Lack of dataset for evaluating C-to-safe-Rust transpilation correctness
Challenges in translating complex C projects with cross-file dependencies
Current methods struggle with safe, idiomatic Rust generation from C
Innovation

Methods, ideas, or system contributions that make the work stand out.

CRUST-Bench dataset for C-to-safe-Rust transpilation
Manually-written safe Rust interfaces and test cases
Evaluates LLMs on complex repository-level transpilation
🔎 Similar Papers
No similar papers found.