ContraFix: Agentic Vulnerability Repair via Differential Runtime Evidence and Skill Reuse

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the limitations of existing large language model agents in automated vulnerability repair, which often misidentify root causes by relying solely on failure execution traces, leading to superficial patches and poor cross-vulnerability knowledge reuse. To overcome these issues, the authors propose ContraFix, a novel framework that constructs boundary test cases and state probes to contrast differential runtime evidence between crashing and normal executions, thereby generating causality-aware repair specifications. ContraFix further builds a reusable repository of repair skills to support future tasks, integrating, for the first time, differential dynamic analysis with skill transfer mechanisms. Coupled with a three-tier retrieval strategy for efficient knowledge reuse across vulnerabilities, the approach achieves state-of-the-art performance—repairing 84.0% of bugs on SEC-Bench and 73.8% on PatchEval—at less than one-third the cost of the strongest baseline.

📝 Abstract

Large language model (LLM) agents are increasingly used for automated vulnerability repair (AVR), where repository-level reasoning enables them to inspect context and produce source-code patches. However, recent empirical results show that these agents still struggle with real-world vulnerabilities. Their main failure mode is semantic misunderstanding: choosing a repair direction that does not match the root cause. We identify two reasons for this gap. Existing agents usually reason from the failing execution alone. A crash report can pinpoint where the program failed, but it does not reveal which variable or state transition, among many candidates near the fault site, separates the crashing behavior from safe execution. As a result, agents often produce symptom-oriented patches instead of causal fixes. Moreover, evidence collected for one vulnerability is rarely retained, so similar cases in later repositories must be diagnosed again from scratch. We present ContraFix, an agentic AVR framework that couples differential runtime evidence with reusable repair skills. Its Mutator constructs PoC variants that straddle the failure boundary; its Analyzer inserts state probes around the fault region and summarizes divergences between crashing and non-crashing executions into a repair specification; and its Patcher converts the specification into verified source patches. Each successful repair updates a two-track skill base containing repair specifications and mutation strategies, which are retrieved through a three-tier policy for future instances. On SEC-Bench (C/C++, 200 instances) and PatchEval (Go, Python, JavaScript, 225 instances), ContraFix with GPT-5-mini resolves 84.0% and 73.8% of the tasks, respectively, achieving state-of-the-art performance on both benchmarks while costing less than one-third of the strongest comparable baseline.

Problem

Research questions and friction points this paper is trying to address.

automated vulnerability repair

semantic misunderstanding

differential runtime evidence

repair skill reuse

root cause identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

differential runtime evidence

skill reuse

agentic vulnerability repair