Critical Considerations on Effort-aware Software Defect Prediction Metrics

📅 2025-04-27

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work challenges the common assumption in defect prediction that analysis effort scales linearly with lines of code (LOC), systematically investigating the sensitivity of Effort-Aware Metrics (EAMs) to alternative effort proxies—such as LOC and McCabe cyclomatic complexity. Through formal mathematical analysis and cross-project empirical evaluation, we demonstrate, for the first time both theoretically and empirically, that EAM outcomes are critically dependent on the chosen effort proxy, revealing their intrinsic nature as *size-aware*, rather than genuinely *effort-aware*. We find substantial discrepancies in model rankings and conclusions derived from EAMs under different effort assumptions—discrepancies that vary significantly across projects. These results expose a fundamental conceptual flaw in current EAMs: they conflate code size with development effort. Our study thus provides both theoretical grounding and empirical evidence to motivate the design of truly effort-sensitive evaluation paradigms in software defect prediction.

Technology Category

Application Category

📝 Abstract

Background. Effort-aware metrics (EAMs) are widely used to evaluate the effectiveness of software defect prediction models, while accounting for the effort needed to analyze the software modules that are estimated defective. The usual underlying assumption is that this effort is proportional to the modules' size measured in LOC. However, the research on module analysis (including code understanding, inspection, testing, etc.) suggests that module analysis effort may be better correlated to code attributes other than size. Aim. We investigate whether assuming that module analysis effort is proportional to other code metrics than LOC leads to different evaluations. Method. We show mathematically that the choice of the code measure used as the module effort driver crucially influences the resulting evaluations. To illustrate the practical consequences of this, we carried out a demonstrative empirical study, in which the same model was evaluated via EAMs, assuming that effort is proportional to either McCabe's complexity or LOC. Results. The empirical study showed that EAMs depend on the underlying effort model, and can give quite different indications when effort is modeled differently. It is also apparent that the extent of these differences varies widely. Conclusions. Researchers and practitioners should be aware that the reliability of the indications provided by EAMs depend on the nature of the underlying effort model. The EAMs used until now appear to be actually size-aware, rather than effort-aware: when analysis effort does not depend on size, these EAMs can be misleading.

Problem

Research questions and friction points this paper is trying to address.

Investigates if non-LOC metrics better reflect module analysis effort

Examines how effort model choice affects defect prediction evaluations

Reveals current EAMs are size-aware rather than truly effort-aware

Innovation

Methods, ideas, or system contributions that make the work stand out.

Investigates effort models beyond LOC metrics

Mathematically analyzes impact of effort drivers

Empirically compares McCabe complexity vs LOC

🔎 Similar Papers

No similar papers found.