🤖 AI Summary
This work addresses the challenge of detecting and repairing compilation errors caused by feature interactions in configurable C systems—errors that traditional compilers and existing variability-aware tools struggle to handle effectively. We present the first systematic exploration of leveraging foundation models for this task, proposing a variability-aware error detection and repair approach based on GPT-OSS-20B and Gemini 3 Pro. Our method is evaluated across synthetic systems, real-world GitHub commits, and mutation testing scenarios. Experimental results demonstrate that GPT-OSS-20B achieves 0.97 precision, 0.90 recall, and 0.94 accuracy on small-scale systems, successfully repairing over 70% of the errors. Notably, it also uncovers potential compilation defects in real Linux commits, offering a low-overhead, high-coverage alternative for variability-aware compilation.
📝 Abstract
Modern software systems often rely on conditional compilation to support optional features and multiple deployment scenarios. In configurable systems, compilation errors may arise only under specific combinations of features, remaining hidden during development and testing. Such variability-induced errors are difficult to detect in practice, as traditional compilers analyze only a single configuration at a time, while existing variability-aware tools typically require complex setup and incur high analysis costs. In this article, we present an empirical study on the use of foundation models to detect and fix compilation errors caused by feature variability in configurable C systems. We evaluate GPT-OSS-20B and GEMINI 3 PRO, and compare them with TYPECHEF, a state-of-the-art variability-aware parser. Our evaluation considers two complementary settings: 5,000 small configurable systems designed to systematically exercise variability-induced compilation behavior, comprising both systems with and without compilation errors, and 14 real-world GitHub commits, as well as an additional set of mutation testing scenarios (42). Our results show that foundation models can effectively identify variability-induced compilation errors. On small configurable systems, GPT-OSS-20B achieved a precision of 0.97, recall of 0.90, and accuracy of 0.94, substantially increasing detection coverage compared to TYPECHEF, and exhibiting performance comparable to GEMINI 3. For compilation error repair, GPT-OSS-20B produced compilable fixes in over 70% of the cases. In the analysis of real commits, CHATGPT-5.2 detected all injected faults except for two cases and identified a potential real compilation bug in a Linux commit with more than 1,000 modified lines. Our findings indicate that current state-of-the-art foundation models provide a practical and low-effort complement to traditional variability-aware analyses.