InfCode: Adversarial Iterative Refinement of Tests and Patches for Reliable Software Issue Resolution

📅 2025-11-19

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

In real-world warehouse-scale software defect repair, insufficient test coverage and weak validation signals often lead to incorrect patch acceptance. Method: This paper proposes an adversarial multi-agent framework comprising three LLM-based agents—test generation, code generation, and patch selection—that collaboratively and iteratively optimize test cases and patches within a containerized environment to achieve precise fault localization and rigorous validation. Contribution/Results: Its key innovation is a bidirectional adversarial mechanism between testing and code generation, coupled with a failure-driven feedback loop that enhances repair robustness. Evaluated on the SWE-bench Verified benchmark, the framework achieves a 79.4% patch correctness rate—significantly surpassing prior state-of-the-art methods and establishing the new best result. The code and models are publicly released.

Technology Category

Application Category

📝 Abstract

Large language models have advanced software engineering automation, yet resolving real-world software issues remains difficult because it requires repository-level reasoning, accurate diagnostics, and strong verification signals. Existing agent-based and pipeline-based methods often rely on insufficient tests, which can lead to patches that satisfy verification but fail to fix the underlying defect. We present InfCode, an adversarial multi-agent framework for automated repository-level issue resolution. InfCode iteratively refines both tests and patches through adversarial interaction between a Test Patch Generator and a Code Patch Generator, while a Selector agent identifies the most reliable fix. The framework runs inside a containerized environment that supports realistic repository inspection, modification, and validation. Experiments on SWE-bench Lite and SWE-bench Verified using models such as DeepSeek-V3 and Claude 4.5 Sonnet show that InfCode consistently outperforms strong baselines. It achieves 79.4% performance on SWE-bench Verified, establishing a new state-of-the-art. We have released InfCode as an open-source project at https://github.com/Tokfinity/InfCode.

Problem

Research questions and friction points this paper is trying to address.

Automating reliable software issue resolution at repository level

Addressing insufficient testing that leads to incorrect patches

Improving verification signals for accurate software defect fixes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial multi-agent framework refines tests and patches

Containerized environment enables realistic repository validation

Iterative refinement between test and code patch generators

🔎 Similar Papers

Automated Test Case Repair Using Language Models