🤖 AI Summary
Existing C-to-Rust translation evaluation lacks small-scale, highly representative function-level benchmarks. To address this, we propose a minimization framework for benchmark construction based on representative sampling, integrating program analysis, multi-dimensional function feature extraction, clustering-based sampling, and real-world function selection from large-scale projects (e.g., Linux, FFmpeg). Our approach achieves the first joint optimization of coverage and scale efficiency. The resulting benchmark, C2RUST-BENCH, comprises 2,905 highly representative C functions—only 18.7% of the original 15,503—yet fully covers prevalent memory-safety vulnerability patterns and translation challenges (e.g., pointer arithmetic, manual memory management, complex control flow). This compact, high-fidelity dataset significantly improves automation feasibility and manual validation efficiency, establishing the first lightweight, trustworthy, and extensible standardized benchmark for cross-language migration research.
📝 Abstract
Despite the effort in vulnerability detection over the last two decades, memory safety vulnerabilities continue to be a critical problem. Recent reports suggest that the key solution is to migrate to memory-safe languages. To this end, C-to-Rust transpilation becomes popular to resolve memory-safety issues in C programs. Recent works propose C-to-Rust transpilation frameworks; however, a comprehensive evaluation dataset is missing. Although one solution is to put together a large enough dataset, this increases the analysis time in automated frameworks as well as in manual efforts for some cases. In this work, we build a method to select functions from a large set to construct a minimized yet representative dataset to evaluate the C-to-Rust transpilation. We propose C2RUST-BENCH that contains 2,905 functions, which are representative of C-to-Rust transpilation, selected from 15,503 functions of real-world programs.