CppPerf: An Automated Pipeline and Dataset for Performance-Improving C++ Commits

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

215K/year
🤖 AI Summary
Existing benchmarks for C++ performance repair are often based on competitive programming code or focus on other languages, lacking executable, real-world evaluation data. This work proposes CppPerf-Mine, an automated pipeline that mines genuine performance-improving commits from GitHub repositories and constructs CppPerf-DB—the first large-scale, multi-file, reproducible dataset of C++ performance enhancements. The pipeline integrates structured filtering, large language model–based classification, and Dockerized build-and-test validation. CppPerf-DB comprises 347 human-verified patches spanning 42 mature projects, with 39% involving modifications across multiple files. Preliminary evaluation shows that OpenHands, a state-of-the-art tool, successfully repairs only 13.5% of the cases, underscoring the dataset’s critical value for advancing repository-level performance repair research.
📝 Abstract
Recent progress in automated repair of performance bugs demands realistic, executable benchmarks. However, existing C++ performance benchmarks are largely built from competitive programming submissions, and recent real-world benchmarks predominantly target Python and .NET. To fill this gap, we present CppPerf-Mine, a configurable pipeline that mines execution-time-improving patches from open-source C++ repositories on GitHub by combining structural commit filtering, an LLM-based commit classifier, and a containerized build & test stage that produces fully reproducible Docker images for each patch. Using CppPerf-Mine, we build CppPerf-DB, a benchmark comprising 347 manually verified patches from 42 mature C++ repositories, 39% of which are multi-file, enabling the evaluation of repository-level repair tools. In our preliminary study, OpenHands correctly fixes only 13.5% of the patches in CppPerf-DB, confirming that real-world C++ performance repair remains an open challenge. CppPerf-Mine and CppPerf-DB are open-source and publicly available at: https://doi.org/10.5281/zenodo.20097425. In addition, a demonstration video is available at: https://www.youtube.com/watch?v=nixlupIgSdM.
Problem

Research questions and friction points this paper is trying to address.

C++ performance bugs
automated program repair
performance-improving commits
benchmark dataset
real-world C++ repositories
Innovation

Methods, ideas, or system contributions that make the work stand out.

CppPerf-Mine
performance-improving commits
LLM-based commit classification
containerized benchmarking
CppPerf-DB
🔎 Similar Papers