🤖 AI Summary
Existing code coverage metrics inadequately reflect a test suite’s ability to validate program behavior, while mutation testing incurs prohibitively high computational overhead. To address this, we propose Metamorphic Coverage (MC), a novel coverage metric tailored for mutation testing. MC quantifies test sensitivity to potential faults by measuring path divergence between metamorphic input pairs. It is the first systematic formulation and empirical evaluation of a coverage criterion explicitly designed for metamorphic relations. We validate MC across diverse systems—including databases, compilers, and constraint solvers. Experiments on 64 real-world defects show MC achieves a fault-localization coverage of 78.1%, with sensitivity four times that of line coverage and mean overhead only one-sixth thereof. Moreover, MC’s computational cost is merely 0.28% of that of full mutation testing, while improving defect detection rate by 41% over conventional coverage metrics.
📝 Abstract
Metamorphic testing is a widely used methodology that examines an expected relation between pairs of executions to automatically find bugs, such as correctness bugs. We found that code coverage cannot accurately measure the extent to which code is validated and mutation testing is computationally expensive for evaluating metamorphic testing methods. In this work, we propose Metamorphic Coverage (MC), a coverage metric that examines the distinct code executed by pairs of test inputs within metamorphic testing. Our intuition is that, typically, a bug can be observed if the corresponding code is executed when executing either test input but not the other one, so covering more differential code covered by pairs of test inputs might be more likely to expose bugs. While most metamorphic testing methods have been based on this general intuition, our work defines and systematically evaluates MC on five widely used metamorphic testing methods for testing database engines, compilers, and constraint solvers. The code measured by MC overlaps with the bug-fix locations of 50 of 64 bugs found by metamorphic testing methods, and MC has a stronger positive correlation with bug numbers than line coverage. MC is 4x more sensitive than line coverage in distinguishing testing methods' effectiveness, and the average value of MC is 6x smaller than line coverage while still capturing the part of the program that is being tested. MC required 359x less time than mutation testing. Based on a case study for an automated database system testing approach, we demonstrate that when used for feedback guidance, MC significantly outperforms code coverage, by finding 41% more bugs. Consequently, this work might have broad applications for assessing metamorphic testing methods and improving test-case generation.