Is this Build Failure Related to my Patch? An Empirical Study of Unrelated Build Failures in Continuous Integration

📅 2026-05-06

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This study addresses the challenge developers face in continuous integration (CI) environments when determining whether a build failure stems from their own code changes, often leading to costly and unnecessary debugging efforts. The work presents the first systematic identification and categorization of “irrelevant build failures”—failures unrelated to a developer’s recent commit. Through an empirical analysis of 77,354 failed builds across seven Apache projects, combined with documentation mining and a rich set of 33 features—including CI latency, error recurrence rate, and comment count—the authors propose a prediction approach based on Positive and Unlabeled (PU) learning. Evaluated across multiple projects, the model achieves precision of 0.70–0.88, F1-scores of 0.44–0.91, and AUC values of 0.63–0.97, significantly improving the efficiency of identifying irrelevant failures and alleviating developers’ diagnostic burden.

📝 Abstract

Continuous Integration (CI) systems often run many builds concurrently. In this setting, a legitimate build failure may not be caused by the code push that triggered it. Such unrelated build failures can waste developer effort because developers must determine whether the failure is actionable for their current change. We study 77,354 CI build failures from seven open source Apache projects to understand and predict unrelated build failures. We find that developers spend a median of 4 hours identifying whether a failure is related or unrelated to their push. We also perform a document analysis of 371 confirmed unrelated build failures sampled from 10,316 potentially unrelated failures. The analysis shows that unrelated test failures account for 20% of the cases in which developers classify build failures as unrelated. To predict unrelated build failures, we extract 33 features from issue reports, issue comments, and commits associated with the triggering push. We build semi-supervised Positive and Unlabeled (PU) learning models for seven Apache projects. The models achieve precision from 0.70 to 0.88, recall from 0.30 to 1.00, F1-score from 0.44 to 0.91, and AUC from 0.63 to 0.97. Feature importance analysis shows that CI latency, repeated error messages, and the number of preceding comments are useful indicators of unrelated build failures. These results show that PU learning can help developers identify build failures that are unlikely to be caused by their current push.

Problem

Research questions and friction points this paper is trying to address.

build failure

continuous integration

unrelated failure

developer effort

code push

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unrelated Build Failures

Continuous Integration

PU Learning