Leveraging Commit Size Context and Hyper Co-Change Graph Centralities for Defect Prediction

๐Ÿ“… 2026-04-01
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limitations of traditional file-level defect prediction models, which often overlook the impact of commit sizeโ€”the number of files modified in a single commitโ€”on software quality and struggle to capture higher-order change semantics. To overcome these issues, the authors propose a commit-size-aware defect prediction approach that reformulates process metrics into high-dimensional vectors incorporating commit size and constructs a hyper-co-change graph to naturally encode size information. File importance is then quantified via graph centrality measures derived from this representation. Empirical evaluation on nine long-term Apache projects demonstrates that the proposed method significantly outperforms current baselines, achieving statistically significant improvements in prediction performance, model discriminative power, and calibration.
๐Ÿ“ Abstract
File-level defect prediction models traditionally rely on product and process metrics. While process metrics effectively complement product metrics, they often overlook commit size the number of files changed per commit despite its strong association with software quality. Network centrality measures on dependency graphs have also proven to be valuable product level indicators. Motivated by this, we first redefine process metrics as commit size aware process metric vectors, transforming conventional scalar measures into 100 dimensional profiles that capture the distribution of changes across commit size strata. We then model change history as a hyper co change graph, where hyperedges naturally encode commit-size semantics. Vector centralities computed on these hypergraphs quantify size-aware node importance for source files. Experiments on nine long-lived Apache projects using five popular classifiers show that replacing scalar process metrics with the proposed commit size aware vectors, alongside product metrics, consistently improves predictive performance. These findings establish that commit size aware process metrics and hypergraph based vector centralities capture higher-order change semantics, leading to more discriminative, better calibrated, and statistically superior defect prediction models.
Problem

Research questions and friction points this paper is trying to address.

defect prediction
commit size
process metrics
hyper co-change graph
software quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

commit size awareness
hyper co-change graph
vector centrality
defect prediction
process metrics
๐Ÿ”Ž Similar Papers
No similar papers found.