CXXCrafter: An LLM-Based Agent for Automated C/C++ Open Source Software Building

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automated build automation for C/C++ open-source projects faces core challenges including intricate dependency graphs, heterogeneous build systems (e.g., Make, CMake, Autotools, Bazel), diverse toolchains, and poor error resilience—areas where existing approaches offer limited support. To address these, we propose the first large language model (LLM)-based dynamic interactive build agent framework. It enables end-to-end adaptive build repair via implicit knowledge reasoning and closed-loop environmental feedback. We design a unified abstraction layer to support multiple build systems and integrate context-aware error diagnosis, intelligent retry mechanisms, and recovery strategies grounded in real-time build-state awareness. Evaluated on standard open-source benchmarks, our framework achieves a 78% build success rate; on the Top100 dataset, it outperforms manual builds on three projects and significantly improves overall build coverage. This work establishes a scalable, LLM-driven paradigm for C/C++ ecosystem build automation.

Technology Category

Application Category

📝 Abstract
Project building is pivotal to support various program analysis tasks, such as generating intermediate rep- resentation code for static analysis and preparing binary code for vulnerability reproduction. However, automating the building process for C/C++ projects is a highly complex endeavor, involving tremendous technical challenges, such as intricate dependency management, diverse build systems, varied toolchains, and multifaceted error handling mechanisms. Consequently, building C/C++ projects often proves to be difficult in practice, hindering the progress of downstream applications. Unfortunately, research on facilitating the building of C/C++ projects remains to be inadequate. The emergence of Large Language Models (LLMs) offers promising solutions to automated software building. Trained on extensive corpora, LLMs can help unify diverse build systems through their comprehension capabilities and address complex errors by leveraging tacit knowledge storage. Moreover, LLM-based agents can be systematically designed to dynamically interact with the environment, effectively managing dynamic building issues. Motivated by these opportunities, we first conduct an empirical study to systematically analyze the current challenges in the C/C++ project building process. Particularly, we observe that most popular C/C++ projects encounter an average of five errors when relying solely on the default build systems. Based on our study, we develop an automated build system called CXXCrafter to specifically address the above-mentioned challenges, such as dependency resolution. Our evaluation on open-source software demonstrates that CXXCrafter achieves a success rate of 78% in project building. Specifically, among the Top100 dataset, 72 projects are built successfully by both CXXCrafter and manual efforts, 3 by CXXCrafter only, and 14 manually only. ...
Problem

Research questions and friction points this paper is trying to address.

Automating C/C++ project building is complex due to dependencies and errors
Existing methods struggle with diverse build systems and toolchains
LLM-based agents can dynamically resolve build issues and dependencies
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based agent for automated C/C++ building
Unifies diverse build systems via LLM comprehension
Dynamically interacts to manage build issues
🔎 Similar Papers
No similar papers found.
Z
Zhengmin Yu
Fudan University, China
Y
Yuan Zhang
Fudan University, China
M
Ming Wen
Huazhong University of Science and Technology, China
Y
Yinan Nie
Fudan University, China
Wenhui Zhang
Wenhui Zhang
Researcher/Software Engineer
Infrastructure and System
Min Yang
Min Yang
Bytedance
Vision Language ModelComputer VisionVideo Understanding