🤖 AI Summary
Existing bytecode-based mutation testing tools, such as PIT, struggle to produce structured, source-code-level defect data, limiting their utility in large-model-driven software engineering tasks. This work proposes a novel approach that integrates XML metadata generated by PIT with debugging information from Java class files to automatically locate and reconstruct the precise source code edits corresponding to each mutant. For the first time, this enables fully automated reconstruction of readable and reproducible source-code-level defect samples from bytecode-level mutants. The method has been applied to eight open-source Java projects to build a high-quality, structured dataset of paired correct and defective code snippets, enriched with contextual documentation and metadata. This dataset supports training and evaluation for downstream tasks such as bug localization, repair, and test generation, while effectively mitigating risks of training data contamination.
📝 Abstract
LLM-based software engineering increasingly depends on executable, context-rich bug artifacts: paired correct and buggy code, methods under test (MUTs), documentation, and metadata. These artifacts support the training and evaluation of automated bug localization and repair techniques, testing and test oracle generation methods, and documentation-driven automation. Although curated benchmarks (e.g., Defects4J) remain valuable, they are static and increasingly vulnerable to contamination as code models are trained on large public corpora. A complementary strategy is to generate fresh, cutoff-aware datasets by selecting real system versions and injecting controlled bugs at the source level.
Mutation testing is a natural basis for this strategy: it applies predefined mutation operators to programs and records whether the existing test suite detects each injected change. PIT is a state-of-the-practice mutation testing tool for Java that performs mutation at the bytecode level. This design makes mutation testing fast and practical, but PITMuS reports mutants primarily through XML, making them difficult to inspect, replay, or reuse as structured source-level dataset records. To address this gap, we present PITMuS, which combines PITMuS XML metadata with debug information from compiled Java class files to localize and reconstruct the source edit corresponding to each mutant. PITMuS then automatically produces structured datasets containing source-level buggy and fixed code pairs, documentation context, and metadata for downstream training and evaluation. Although we evaluate PITMuS on eight open-source Java systems, it can be applied to any Java system where PITMuS can be integrated.