🤖 AI Summary
To address ambiguity and inconsistency in large language model (LLM)-generated code stemming from vague user requirements, this paper proposes a mutation-relation (MR)-guided pre-specification optimization method—marking the first application of MRs for proactive task specification refinement, rather than conventional post-hoc validation. We design an MR-driven agent that models semantic constraints on LLM outputs and enables iterative feedback via a closed-loop refinement process. Our framework integrates multiple state-of-the-art models—including GPT-4o, Mistral Large, and Qwen3-Coder—within a rule-guided code generation architecture. Evaluated on HumanEval-Pro, MBPP-Pro, and SWE-Bench_Lite, our approach achieves up to a 17% absolute improvement in code accuracy and attains 99.81% coverage, significantly enhancing output consistency and reliability across diverse programming tasks.
📝 Abstract
Metamorphic Relations (MRs) serve as a foundational mechanism for generating semantically equivalent mutations. Software engineering has advanced significantly in recent years with the advent of Large Language Models (LLMs). However, the reliability of LLMs in software engineering is often compromised by ambiguities and inconsistencies due to improper user specification. To address this challenge, we present CodeMetaAgent (CMA), a metamorphic relation-driven LLM agent that systematically refines task specifications and generates semantically constrained test cases. Our proposed framework uses MRs with LLMs to improve generation consistency and reduce variability caused by specifications, unlike the traditional use of MRs as post validations. Our framework has been evaluated on the HumanEval-Pro, MBPP-Pro, and SWE-Bench_Lite datasets using the GPT-4o, Mistral Large, GPT-OSS, and Qwen3-Coder models. It improved code generation accuracy by up to 17% and achieved code coverage gains of up to 99.81%. These results show that metamorphic relations can be a simple but effective guide in assisting LLM-based software development.