Beyond Fixed Tests: Repository-Level Issue Resolution as Coevolution of Code and Behavioral Constraints

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work addresses a key limitation in conventional large language model (LLM)-based program repair approaches, which treat tests as static constraints and often yield under-constrained, fragile, or overfitted patches. To overcome this, the authors propose Agent-CoEvo, a novel framework that models repository-level repair as a co-evolutionary process between code patches and test patches. By leveraging a multi-agent architecture, Agent-CoEvo dynamically refines behavioral constraints through iterative mutual evaluation and semantic recombination, enabling joint evolution of both implementation and specification. The framework is trained and evaluated end-to-end on SWE-bench Lite and SWT-bench Lite, significantly outperforming state-of-the-art agent-based and non-agent baselines in both repair success rate and test reproduction quality.

Technology Category

Application Category

📝 Abstract

Software engineers resolving repository-level issues do not treat existing tests as immutable correctness oracles. Instead, they iteratively refine both code and the tests used to characterize intended behavior, as new modifications expose missing assumptions or misinterpreted failure conditions. In contrast, most existing large language model (LLM)-based repair systems adopt a linear pipeline in which tests or other validation signals act mostly as post-hoc filters, treating behavioral constraints as fixed during repair. This formulation reduces repair to optimizing code under static and potentially misaligned constraints, leading to under-constrained search and brittle or overfitted fixes. We argue that repository-level issue resolution is fundamentally not optimization under fixed tests, but search over evolving behavioral constraints. To operationalize this view, we propose Agent-CoEvo, a coevolutionary multi-agent framework in which candidate code patches and test patches are jointly explored and iteratively refined. Rather than treating tests as immutable oracles, our framework models them as dynamic constraints that both guide and are revised by the repair process. Through mutual evaluation and semantic recombination, code and test candidates progressively narrow the space of behavior consistent with the issue description. Evaluated on SWE-bench Lite and SWT-bench Lite, Agent-CoEvo consistently outperforms state-of-the-art agent-based and agentless baselines in both repair success and test reproduction quality. Our findings suggest that enabling repair agents to revise behavioral constraints during search is critical for reliable issue resolution, pointing toward a shift from code-only optimization to coevolution of implementation and specification.

Problem

Research questions and friction points this paper is trying to address.

repository-level issue resolution

behavioral constraints

test evolution

code repair

coevolution

Innovation

Methods, ideas, or system contributions that make the work stand out.

coevolution

multi-agent framework

test evolution