Automated Repair of Ambiguous Natural Language Requirements

📅 2025-05-12

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

Ambiguity in natural-language requirements frequently leads large language models (LLMs) to generate incorrect code. To address this, we propose SpecFix—a fully automated ambiguity resolution method that requires no LLM metacognitive capabilities. SpecFix first models the distribution of program interpretations induced by an LLM’s responses to the original requirement, using program testing and automated repair. It then infers a revised, minimal, and verifiable specification by reversely analyzing distributional shifts and iteratively contracting the requirement space under logical constraints. Crucially, SpecFix decouples ambiguity resolution into two novel phases: (1) modeling the program-interpretation distribution and (2) constraint-driven, contraction-based specification inference—thereby eliminating reliance on self-reflection. Evaluated on HumanEval+ and MBPP+, SpecFix improves Pass@1 by 4.3% on average across GPT-4o, DeepSeek-V3, and Qwen2.5-Coder, and boosts majority-vote solution rates by 3.4%.

Technology Category

Application Category

📝 Abstract

The rise of large language models (LLMs) has amplified the role of natural language (NL) in software engineering, and its inherent ambiguity and susceptibility to misinterpretation pose a fundamental challenge for software quality, because employing ambiguous requirements may result in the generation of faulty programs. The complexity of ambiguity detection and resolution motivates us to introduce the problem of automated repair of ambiguous NL requirements. Repairing ambiguity in requirements poses a challenge for LLMs, as it demands a metacognitive capability - the ability to reflect on how alterations to the text influence their own interpretation of this text. Indeed, our experiments show that directly prompting an LLM to detect and resolve ambiguities results in irrelevant or inconsistent clarifications. Our key novelty is in decomposing this problem into simpler subproblems which do not require metacognitive reasoning. First, we analyze and repair LLM's interpretation of requirements embodied in the distribution of programs they induce using traditional testing and program repair methods. Second, we repair requirements based on the changes to the distribution via what we refer to as contractive specification inference. This decomposition enables targeted, minimal requirement repairs that yield cross-model performance gains in code generation. We implemented this approach in a tool SpecFix, and evaluated it using three SOTA LLMs, GPT-4o, DeepSeek-V3 and Qwen2.5-Coder-32b-Instruct, across two widely-used code generation benchmarks: HumanEval+ and MBPP+. Our results show that SpecFix, operating autonomously without human intervention or external information, outputs repaired requirements that, when used by LLMs for code generation, increase the Pass@1 score by 4.3%, and help LLMs to solve 3.4% more problems via majority vote.

Problem

Research questions and friction points this paper is trying to address.

Automated repair of ambiguous natural language requirements in software engineering

Addressing LLMs' limitations in detecting and resolving ambiguities without metacognitive reasoning

Improving code generation accuracy via targeted requirement repairs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decompose ambiguity repair into simpler subproblems

Use testing and program repair for interpretation analysis

Apply contractive specification inference for requirement changes

🔎 Similar Papers

No similar papers found.