InfCode: Adversarial Iterative Refinement of Tests and Patches for Reliable Software Issue Resolution

📅 2025-11-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In real-world warehouse-scale software defect repair, insufficient test coverage and weak validation signals often lead to incorrect patch acceptance. Method: This paper proposes an adversarial multi-agent framework comprising three LLM-based agents—test generation, code generation, and patch selection—that collaboratively and iteratively optimize test cases and patches within a containerized environment to achieve precise fault localization and rigorous validation. Contribution/Results: Its key innovation is a bidirectional adversarial mechanism between testing and code generation, coupled with a failure-driven feedback loop that enhances repair robustness. Evaluated on the SWE-bench Verified benchmark, the framework achieves a 79.4% patch correctness rate—significantly surpassing prior state-of-the-art methods and establishing the new best result. The code and models are publicly released.

Technology Category

Application Category

📝 Abstract
Large language models have advanced software engineering automation, yet resolving real-world software issues remains difficult because it requires repository-level reasoning, accurate diagnostics, and strong verification signals. Existing agent-based and pipeline-based methods often rely on insufficient tests, which can lead to patches that satisfy verification but fail to fix the underlying defect. We present InfCode, an adversarial multi-agent framework for automated repository-level issue resolution. InfCode iteratively refines both tests and patches through adversarial interaction between a Test Patch Generator and a Code Patch Generator, while a Selector agent identifies the most reliable fix. The framework runs inside a containerized environment that supports realistic repository inspection, modification, and validation. Experiments on SWE-bench Lite and SWE-bench Verified using models such as DeepSeek-V3 and Claude 4.5 Sonnet show that InfCode consistently outperforms strong baselines. It achieves 79.4% performance on SWE-bench Verified, establishing a new state-of-the-art. We have released InfCode as an open-source project at https://github.com/Tokfinity/InfCode.
Problem

Research questions and friction points this paper is trying to address.

Automating reliable software issue resolution at repository level
Addressing insufficient testing that leads to incorrect patches
Improving verification signals for accurate software defect fixes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial multi-agent framework refines tests and patches
Containerized environment enables realistic repository validation
Iterative refinement between test and code patch generators
🔎 Similar Papers
No similar papers found.
K
KeFan Li
Beihang University, China and Beijing Tokfinity Technology Co., Ltd., China
M
Mengfei Wang
Beijing Tokfinity Technology Co., Ltd., China
H
Hengzhi Zhang
Beijing Tokfinity Technology Co., Ltd., China
Z
Zhichao Li
Beijing Tokfinity Technology Co., Ltd., China
Y
Yuan Yuan
Beihang University, China
M
Mu Li
Beihang University, China
X
Xiang Gao
Beihang University, China
Hailong Sun
Hailong Sun
Professor of Computer Science, Beihang University
Software EngineeringArtificial IntelligenceSoftware Systems
C
Chunming Hu
Beihang University, China
W
Weifeng Lv
Beihang University, China