🤖 AI Summary
This paper addresses the challenges of test suite redundancy and high computational overhead in mutation testing for Solidity smart contracts. We propose PRIMG, an incremental adaptive test generation framework. Its core innovations are: (1) constructing a mutant subsumption graph and training a lightweight ML model to predict mutant prioritization; and (2) integrating LLM-driven iterative test generation with dual syntactic and behavioral refinement to enhance test effectiveness. Experiments on real-world Code4Arena projects show that PRIMG reduces test suite size by 47.3% on average while maintaining a mutant score above 92.1%. Compared to random selection, its mutant prioritization improves detection efficiency of high-impact mutants by 3.2×. Moreover, the refinement mechanism boosts the pass rate and functional correctness of LLM-generated tests by 58.6% and 41.4%, respectively.
📝 Abstract
Mutation testing is a widely recognized technique for assessing and enhancing the effectiveness of software test suites by introducing deliberate code mutations. However, its application often results in overly large test suites, as developers generate numerous tests to kill specific mutants, increasing computational overhead. This paper introduces PRIMG (Prioritization and Refinement Integrated Mutation-driven Generation), a novel framework for incremental and adaptive test case generation for Solidity smart contracts. PRIMG integrates two core components: a mutation prioritization module, which employs a machine learning model trained on mutant subsumption graphs to predict the usefulness of surviving mutants, and a test case generation module, which utilizes Large Language Models (LLMs) to generate and iteratively refine test cases to achieve syntactic and behavioral correctness. We evaluated PRIMG on real-world Solidity projects from Code4Arena to assess its effectiveness in improving mutation scores and generating high-quality test cases. The experimental results demonstrate that PRIMG significantly reduces test suite size while maintaining high mutation coverage. The prioritization module consistently outperformed random mutant selection, enabling the generation of high-impact tests with reduced computational effort. Furthermore, the refining process enhanced the correctness and utility of LLM-generated tests, addressing their inherent limitations in handling edge cases and complex program logic.