Code-A1: Adversarial Evolving of Code LLM and Test LLM via Reinforcement Learning

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing reinforcement learning approaches for code generation are constrained by static, low-coverage test suites and suffer from self-collusion or poor generalization when relying on self-generated tests. This work proposes an adversarial co-evolution framework that jointly optimizes large language models for code generation and test generation: the former aims to pass tests, while the latter seeks to expose defects. By employing a decoupled architecture in adversarial training, the framework mitigates self-collusion and enables dynamic, high-quality interaction through white-box-accessible targeted test generation, error-aware experience replay, and a composite reward design. Experiments on Qwen2.5-Coder demonstrate that the method matches or even surpasses models supervised by human-written tests in code generation performance, while substantially enhancing test generation capability.

Technology Category

Application Category

📝 Abstract
Reinforcement learning for code generation relies on verifiable rewards from unit test pass rates. Yet high-quality test suites are scarce, existing datasets offer limited coverage, and static rewards fail to adapt as models improve. Recent self-play methods unify code and test generation in a single model, but face a inherent dilemma: white-box access leads to self-collusion where the model produces trivial tests for easy rewards, yet black-box restriction yields generic tests that miss implementation-specific bugs. We introduce Code-A1, an adversarial co-evolution framework that jointly optimizes a Code LLM and a Test LLM with opposing objectives. The Code LLM is rewarded for passing more tests, while the Test LLM is rewarded for exposing more defects. This architectural separation eliminates self-collusion risks and safely enables white-box test generation, where the Test LLM can inspect candidate code to craft targeted adversarial tests. We further introduce a Mistake Book mechanism for experience replay and a composite reward balancing test validity with adversarial difficulty. Experiments on Qwen2.5-Coder models demonstrate that Code-A1 achieves code generation performance matching or exceeding models trained on human-annotated tests, while significantly improving test generation capability.
Problem

Research questions and friction points this paper is trying to address.

code generation
test generation
reinforcement learning
self-collusion
adversarial testing
Innovation

Methods, ideas, or system contributions that make the work stand out.

adversarial co-evolution
code generation
test generation
reinforcement learning
white-box testing
🔎 Similar Papers
No similar papers found.
A
Aozhe Wang
Zhejiang University
Yuchen Yan
Yuchen Yan
Zhejiang University
Large Language ModelsLLM Reasoning
N
Nan Zhou
Zhejiang University
Zhengxi Lu
Zhengxi Lu
Zhejiang University
MLLMAgent
Weiming Lu
Weiming Lu
Zhejiang University
Natural Language ProcessingLarge Language ModelsAGI
J
Jun Xiao
Zhejiang University
Y
Yueting Zhuang
Zhejiang University
Y
Yongliang Shen
Zhejiang University