LLMorpheus: Mutation Testing using Large Language Models

📅 2024-04-15

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Existing mutation testing relies on fixed operators (e.g., operator replacement, statement deletion), limiting its ability to emulate realistic faults and thereby constraining evaluation validity. This paper proposes the first large language model (LLM)-based dynamic mutant generation method: it identifies critical code locations, injects placeholders, and leverages prompt engineering to guide LLMs—including Codex, Llama, and GPT—to produce semantically coherent and high-fidelity mutants. Our approach breaks free from conventional operator constraints, enabling the first LLM-driven, context-aware, and dynamically constructed high-fidelity mutants. It captures real-world defect patterns inaccessible to tools like StrykerJS and supports multi-prompt strategies and cross-LLM adaptability. Evaluated on 13 open-source JavaScript packages, our method significantly improves mutant diversity and realism while maintaining practical execution efficiency and controllable computational cost.

Technology Category

Application Category

📝 Abstract

In mutation testing, the quality of a test suite is evaluated by introducing faults into a program and determining whether the program's tests detect them. Most existing approaches for mutation testing involve the application of a fixed set of mutation operators, e.g., replacing a"+"with a"-", or removing a function's body. However, certain types of real-world bugs cannot easily be simulated by such approaches, limiting their effectiveness. This paper presents a technique for mutation testing where placeholders are introduced at designated locations in a program's source code and where a Large Language Model (LLM) is prompted to ask what they could be replaced with. The technique is implemented in LLMorpheus, a mutation testing tool for JavaScript, and evaluated on 13 subject packages, considering several variations on the prompting strategy, and using several LLMs. We find LLMorpheus to be capable of producing mutants that resemble existing bugs that cannot be produced by StrykerJS, a state-of-the-art mutation testing tool. Moreover, we report on the running time, cost, and number of mutants produced by LLMorpheus, demonstrating its practicality.

Problem

Research questions and friction points this paper is trying to address.

Enhances mutation testing by using LLMs to generate diverse mutants.

Addresses limitations of traditional mutation operators in simulating real-world bugs.

Evaluates practicality of LLM-based mutation testing in JavaScript programs.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Large Language Models for mutation testing

Introduces placeholders in source code for mutations

Evaluates effectiveness with multiple LLM strategies

🔎 Similar Papers

An Exploratory Study on Using Large Language Models for Mutation Testing