LLM Assisted Coding with Metamorphic Specification Mutation Agent

📅 2025-11-22

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

To address ambiguity and inconsistency in large language model (LLM)-generated code stemming from vague user requirements, this paper proposes a mutation-relation (MR)-guided pre-specification optimization method—marking the first application of MRs for proactive task specification refinement, rather than conventional post-hoc validation. We design an MR-driven agent that models semantic constraints on LLM outputs and enables iterative feedback via a closed-loop refinement process. Our framework integrates multiple state-of-the-art models—including GPT-4o, Mistral Large, and Qwen3-Coder—within a rule-guided code generation architecture. Evaluated on HumanEval-Pro, MBPP-Pro, and SWE-Bench_Lite, our approach achieves up to a 17% absolute improvement in code accuracy and attains 99.81% coverage, significantly enhancing output consistency and reliability across diverse programming tasks.

Technology Category

Application Category

📝 Abstract

Metamorphic Relations (MRs) serve as a foundational mechanism for generating semantically equivalent mutations. Software engineering has advanced significantly in recent years with the advent of Large Language Models (LLMs). However, the reliability of LLMs in software engineering is often compromised by ambiguities and inconsistencies due to improper user specification. To address this challenge, we present CodeMetaAgent (CMA), a metamorphic relation-driven LLM agent that systematically refines task specifications and generates semantically constrained test cases. Our proposed framework uses MRs with LLMs to improve generation consistency and reduce variability caused by specifications, unlike the traditional use of MRs as post validations. Our framework has been evaluated on the HumanEval-Pro, MBPP-Pro, and SWE-Bench_Lite datasets using the GPT-4o, Mistral Large, GPT-OSS, and Qwen3-Coder models. It improved code generation accuracy by up to 17% and achieved code coverage gains of up to 99.81%. These results show that metamorphic relations can be a simple but effective guide in assisting LLM-based software development.

Problem

Research questions and friction points this paper is trying to address.

Addresses LLM reliability issues in software engineering

Reduces code generation variability from ambiguous specifications

Improves accuracy and coverage in automated programming tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Metamorphic relations refine task specifications systematically

Agent generates semantically constrained test cases

Framework improves code generation accuracy and coverage

🔎 Similar Papers

SpecGen: Automated Generation of Formal Program Specifications via Large Language Models