🤖 AI Summary
Existing mutation testing relies on fixed operators (e.g., operator replacement, statement deletion), limiting its ability to emulate realistic faults and thereby constraining evaluation validity. This paper proposes the first large language model (LLM)-based dynamic mutant generation method: it identifies critical code locations, injects placeholders, and leverages prompt engineering to guide LLMs—including Codex, Llama, and GPT—to produce semantically coherent and high-fidelity mutants. Our approach breaks free from conventional operator constraints, enabling the first LLM-driven, context-aware, and dynamically constructed high-fidelity mutants. It captures real-world defect patterns inaccessible to tools like StrykerJS and supports multi-prompt strategies and cross-LLM adaptability. Evaluated on 13 open-source JavaScript packages, our method significantly improves mutant diversity and realism while maintaining practical execution efficiency and controllable computational cost.
📝 Abstract
In mutation testing, the quality of a test suite is evaluated by introducing faults into a program and determining whether the program's tests detect them. Most existing approaches for mutation testing involve the application of a fixed set of mutation operators, e.g., replacing a"+"with a"-", or removing a function's body. However, certain types of real-world bugs cannot easily be simulated by such approaches, limiting their effectiveness. This paper presents a technique for mutation testing where placeholders are introduced at designated locations in a program's source code and where a Large Language Model (LLM) is prompted to ask what they could be replaced with. The technique is implemented in LLMorpheus, a mutation testing tool for JavaScript, and evaluated on 13 subject packages, considering several variations on the prompting strategy, and using several LLMs. We find LLMorpheus to be capable of producing mutants that resemble existing bugs that cannot be produced by StrykerJS, a state-of-the-art mutation testing tool. Moreover, we report on the running time, cost, and number of mutants produced by LLMorpheus, demonstrating its practicality.