Is this Build Failure Related to my Patch? An Empirical Study of Unrelated Build Failures in Continuous Integration

📅 2026-05-06
📈 Citations: 0
Influential: 0
📄 PDF

career value

199K/year
🤖 AI Summary
This study addresses the challenge developers face in continuous integration (CI) environments when determining whether a build failure stems from their own code changes, often leading to costly and unnecessary debugging efforts. The work presents the first systematic identification and categorization of “irrelevant build failures”—failures unrelated to a developer’s recent commit. Through an empirical analysis of 77,354 failed builds across seven Apache projects, combined with documentation mining and a rich set of 33 features—including CI latency, error recurrence rate, and comment count—the authors propose a prediction approach based on Positive and Unlabeled (PU) learning. Evaluated across multiple projects, the model achieves precision of 0.70–0.88, F1-scores of 0.44–0.91, and AUC values of 0.63–0.97, significantly improving the efficiency of identifying irrelevant failures and alleviating developers’ diagnostic burden.
📝 Abstract
Continuous Integration (CI) systems often run many builds concurrently. In this setting, a legitimate build failure may not be caused by the code push that triggered it. Such unrelated build failures can waste developer effort because developers must determine whether the failure is actionable for their current change. We study 77,354 CI build failures from seven open source Apache projects to understand and predict unrelated build failures. We find that developers spend a median of 4 hours identifying whether a failure is related or unrelated to their push. We also perform a document analysis of 371 confirmed unrelated build failures sampled from 10,316 potentially unrelated failures. The analysis shows that unrelated test failures account for 20% of the cases in which developers classify build failures as unrelated. To predict unrelated build failures, we extract 33 features from issue reports, issue comments, and commits associated with the triggering push. We build semi-supervised Positive and Unlabeled (PU) learning models for seven Apache projects. The models achieve precision from 0.70 to 0.88, recall from 0.30 to 1.00, F1-score from 0.44 to 0.91, and AUC from 0.63 to 0.97. Feature importance analysis shows that CI latency, repeated error messages, and the number of preceding comments are useful indicators of unrelated build failures. These results show that PU learning can help developers identify build failures that are unlikely to be caused by their current push.
Problem

Research questions and friction points this paper is trying to address.

build failure
continuous integration
unrelated failure
developer effort
code push
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unrelated Build Failures
Continuous Integration
PU Learning
Build Failure Prediction
Empirical Study
A
Andie Huang
University of Otago, School of Computing
D
Daniel Alencar da Costa
University of Otago, School of Computing
G
Grant Dick
University of Otago, School of Computing
Mariam El Mezouar
Mariam El Mezouar
Assistant Professor at the Royal Military College of Canada
Mining Software RepositoriesEmpirical Software EngineeringCollaborative Software Development