🤖 AI Summary
Traditional unit testing covers only limited input-output pairs and achieves low path coverage; while property-based testing (PBT) improves breadth, it relies heavily on manually specified properties—entailing high engineering overhead. This paper proposes the first program-semantics-driven automated test generalization method: it statically extracts semantic specifications from Java bytecode via single-path symbolic execution, requiring neither test examples nor human-defined constraints, and enables end-to-end translation from JUnit to jqwik. Our approach integrates symbolic analysis, semantic specification extraction, and PBT framework integration, thereby circumventing the property-definition bottleneck. Evaluation on EqBench and Apache Commons shows mutant score improvements of 1–4 percentage points. Empirical analysis identifies incomplete type support and limitations in static analysis precision as the primary current bottlenecks.
📝 Abstract
Conventional unit tests validate single input-output pairs, leaving most inputs of an execution path untested. Property-based testing addresses this shortcoming by generating multiple inputs satisfying properties but requires significant manual effort to define properties and their constraints. We propose a semantics-based approach that automatically transforms unit tests into property-based tests by extracting specifications from implementations via single-path symbolic analysis. We demonstrate this approach through Teralizer, a prototype for Java that transforms JUnit tests into property-based jqwik tests. Unlike prior work that generalizes from input-output examples, Teralizer derives specifications from program semantics.
We evaluated Teralizer on three progressively challenging datasets. On EvoSuite-generated tests for EqBench and Apache Commons utilities, Teralizer improved mutation scores by 1-4 percentage points. Generalization of mature developer-written tests from Apache Commons utilities showed only 0.05-0.07 percentage points improvement. Analysis of 632 real-world Java projects from RepoReapers highlights applicability barriers: only 1.7% of projects completed the generalization pipeline, with failures primarily due to type support limitations in symbolic analysis and static analysis limitations in our prototype. Based on the results, we provide a roadmap for future work, identifying research and engineering challenges that need to be tackled to advance the field of test generalization.
Artifacts available at: https://doi.org/10.5281/zenodo.17950381