Specification Vibing for Automated Program Repair

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes VibeRepair, a novel program repair paradigm that shifts the focus from code-centric to explicit behavioral specification–centric reasoning to mitigate hallucinated, behaviorally inconsistent fixes commonly produced by large language model–based approaches. VibeRepair first translates buggy code into a structured behavioral specification, corrects deviations in this specification, and then synthesizes repaired code, integrating on-demand reasoning with historical repair evidence to enhance performance in complex scenarios. By reframing repair as an alignment of behavioral intent rather than direct code editing, the approach substantially narrows the search space and improves accuracy. Evaluated on Defects4J v1.2 and v2.0, VibeRepair successfully repairs 174 and 178 bugs, respectively—outperforming the state of the art by 28 and 33 bugs (19% and 23% relative improvement)—and demonstrates strong generalization on real-world post-training benchmarks.

Technology Category

Application Category

📝 Abstract
Large language model (LLM)-driven automated program repair (APR) has advanced rapidly, but most methods remain code-centric: they directly rewrite source code and thereby risk hallucinated, behaviorally inconsistent fixes. This limitation suggests the need for an alternative repair paradigm that relies on a representation more accessible to LLMs than raw code, enabling more accurate understanding, analysis, and alignment during repair. To address this gap, we propose VibeRepair, a specification-centric APR technique that treats repair as behavior-specification repair rather than ad-hoc code editing. VibeRepair first translates buggy code into a structured behavior specification that captures the program's intended runtime behavior, then infers and repairs specification misalignments, and finally synthesizes code strictly guided by the corrected behavior specification. An on-demand reasoning component enriches hard cases with program analysis and historical bug-fix evidence while controlling cost. Across Defects4J and real-world benchmarks and multiple LLMs, VibeRepair demonstrates consistently strong repair effectiveness with a significantly smaller patch space. On Defects4J v1.2, VibeRepair correctly repairs 174 bugs, exceeding the strongest state-of-the-art baseline by 28 bugs, which corresponds to a 19% improvement. On Defects4J v2.0, it repairs 178 bugs, outperforming prior approaches by 33 bugs, representing a 23% improvement. Evaluations on real-world benchmarks collected after the training period of selected LLMs further confirm its effectiveness and generalizability. By centering repair on explicit behavioral intent, VibeRepair reframes APR for the era of"vibe"coding: make the behavior sing, and the code will follow.
Problem

Research questions and friction points this paper is trying to address.

automated program repair
large language models
behavioral specification
code hallucination
program correctness
Innovation

Methods, ideas, or system contributions that make the work stand out.

specification-centric repair
behavioral specification
LLM-driven program repair
automated program repair
vibe coding
🔎 Similar Papers
No similar papers found.