Flow Graph-Based Classification of Defects4J Faults

📅 2025-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing defect classification schemes rely heavily on repair actions and thus fail to capture the intrinsic, structural nature of faults. Method: This paper proposes the first direct fault classification framework grounded in control-flow graphs (CFGs) and data-flow graphs (DFGs), defining six fine-grained control-flow fault types and two data-flow fault types—constituting the first graph-structure-driven, repair-agnostic defect taxonomy. Using static program analysis and fault attribution modeling, we empirically evaluate the framework on 488 real-world faults from Defects4J. Results: Over 70% of faults exhibit at least one control-flow defect type, with “definition faults” being the most prevalent; the majority are attributable to 1–3 types. The taxonomy demonstrates strong utility for precise fault localization and targeted repair optimization, offering a foundational, semantics-aware basis for defect analysis independent of fix patterns.

Technology Category

Application Category

📝 Abstract
Software fault datasets such as Defects4J provide for each individual fault its location and repair, but do not characterize the faults. Current classifications use the repairs as proxies, which does not capture the intrinsic nature of the fault. In this paper, we propose a new, direct fault classification scheme based on the control- and data-flow graph representations of the program. Our scheme comprises six control-flow and two data-flow fault classes. We apply this to 488 faults from seven projects in the Defects4J dataset. We find that one of the data-flow fault classes (definition fault) is the most common individual class but that the majority of faults are classified with at least one control-flow fault class. The majority of the faults are assigned between one and three classes. Our proposed classification can be applied to other fault datasets and can be used to improve fault localization and automated program repair techniques for specific fault classes.
Problem

Research questions and friction points this paper is trying to address.

Classify software faults directly
Use control-flow and data-flow graphs
Improve fault localization and repair techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow graph-based classification
Control-flow and data-flow fault classes
Enhances fault localization and repair
🔎 Similar Papers
No similar papers found.