ATGen: Adversarial Reinforcement Learning for Test Case Generation

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Existing LLM code-generation testing methods rely on static datasets, imposing a “fixed difficulty ceiling” and failing to uncover complex out-of-distribution defects. Method: We propose a dynamic test-generation framework based on adversarial reinforcement learning, wherein a test generator and a malicious code generator engage in a competitive game. This enables online evolution of test difficulty curricula and self-amplifying test capability. The framework jointly optimizes two objectives: code correctness and attack success rate, thereby transcending constraints imposed by static data. Contribution/Results: Experiments demonstrate that our approach significantly outperforms state-of-the-art baselines on both Best-of-N filtering and reward modeling tasks. To the best of our knowledge, this is the first method to achieve continuous evolution of test strategies and measurable improvement in generalization capability—marking a fundamental advance in automated, adaptive LLM code-testing.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) excel at code generation, yet their outputs often contain subtle bugs, for which effective test cases are a critical bottleneck. Existing test generation methods, whether based on prompting or supervised fine-tuning, rely on static datasets. This imposes a ``fixed-difficulty ceiling'', fundamentally limiting their ability to uncover novel or more complex bugs beyond their training scope. To overcome this, we introduce ATGen, a framework that trains a test case generator via adversarial reinforcement learning. ATGen pits a test generator against an adversarial code generator that continuously crafts harder bugs to evade the current policy. This dynamic loop creates a curriculum of increasing difficulty challenging current policy. The test generator is optimized via Reinforcement Learning (RL) to jointly maximize ``Output Accuracy'' and ``Attack Success'', enabling it to learn a progressively stronger policy that breaks the fixed-difficulty ceiling of static training. Extensive experiments demonstrate that ATGen significantly outperforms state-of-the-art baselines. We further validate its practical utility, showing it serves as both a more effective filter for Best-of-N inference and a higher-quality reward source for training code generation models. Our work establishes a new, dynamic paradigm for improving the reliability of LLM-generated code.

Problem

Research questions and friction points this paper is trying to address.

Overcoming the fixed-difficulty ceiling in test case generation for LLM code outputs

Generating progressively harder test cases to uncover complex bugs dynamically

Enhancing reliability of LLM-generated code through adversarial reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial reinforcement learning for test generation

Dynamic loop creating curriculum of increasing difficulty

Jointly maximizes output accuracy and attack success

🔎 Similar Papers

Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation