C2RUST-BENCH: A Minimized, Representative Dataset for C-to-Rust Transpilation Evaluation

📅 2025-04-21

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Existing C-to-Rust translation evaluation lacks small-scale, highly representative function-level benchmarks. To address this, we propose a minimization framework for benchmark construction based on representative sampling, integrating program analysis, multi-dimensional function feature extraction, clustering-based sampling, and real-world function selection from large-scale projects (e.g., Linux, FFmpeg). Our approach achieves the first joint optimization of coverage and scale efficiency. The resulting benchmark, C2RUST-BENCH, comprises 2,905 highly representative C functions—only 18.7% of the original 15,503—yet fully covers prevalent memory-safety vulnerability patterns and translation challenges (e.g., pointer arithmetic, manual memory management, complex control flow). This compact, high-fidelity dataset significantly improves automation feasibility and manual validation efficiency, establishing the first lightweight, trustworthy, and extensible standardized benchmark for cross-language migration research.

Technology Category

Application Category

📝 Abstract

Despite the effort in vulnerability detection over the last two decades, memory safety vulnerabilities continue to be a critical problem. Recent reports suggest that the key solution is to migrate to memory-safe languages. To this end, C-to-Rust transpilation becomes popular to resolve memory-safety issues in C programs. Recent works propose C-to-Rust transpilation frameworks; however, a comprehensive evaluation dataset is missing. Although one solution is to put together a large enough dataset, this increases the analysis time in automated frameworks as well as in manual efforts for some cases. In this work, we build a method to select functions from a large set to construct a minimized yet representative dataset to evaluate the C-to-Rust transpilation. We propose C2RUST-BENCH that contains 2,905 functions, which are representative of C-to-Rust transpilation, selected from 15,503 functions of real-world programs.

Problem

Research questions and friction points this paper is trying to address.

Memory safety vulnerabilities persist in C programs

Lack of representative dataset for C-to-Rust transpilation evaluation

Need minimized dataset to assess C-to-Rust frameworks effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Minimized representative dataset for C-to-Rust

Automated function selection method

Evaluates C-to-Rust transpilation frameworks

🔎 Similar Papers

Context-aware Code Segmentation for C-to-Rust Translation using Large Language Models