AdverMCTS: Combating Pseudo-Correctness in Code Generation via Adversarial Monte Carlo Tree Search

📅 2026-04-12

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the prevalent issue of spurious correctness in large language models for code generation, where models appear correct on static public test cases yet fail to generalize to hidden tests. To overcome this limitation, the authors propose an Adversarial Monte Carlo Tree Search (Adversarial MCTS) framework that formulates code generation as a minimax game between a solver and an attacker: the solver produces candidate programs, while the attacker dynamically constructs targeted boundary test cases to expose logical flaws, thereby creating an increasingly stringent verification environment. This approach uniquely integrates adversarial test generation with MCTS, enabling active discovery of code vulnerabilities through dynamically evolving tests and breaking the overfitting bottleneck inherent in static validation. Experimental results demonstrate that the method substantially outperforms existing techniques, significantly reducing spurious correctness rates and enhancing model generalization and robustness on unseen test cases.

Technology Category

Application Category

📝 Abstract

Recent advancements in Large Language Models (LLMs) have successfully employed search-based strategies to enhance code generation. However, existing methods typically rely on static, sparse public test cases for verification, leading to pseudo-correctness -- where solutions overfit the visible public tests but fail to generalize to hidden test cases. We argue that optimizing against a fixed, weak environment inherently limits robustness. To address this, we propose AdverMCTS, a novel adversarial Monte Carlo Tree Search framework that combats pseudo-correctness by coupling code search with active vulnerability discovery. AdverMCTS formulates generation as a minimax-style game between a Solver agent, which synthesizes code candidates, and an Attacker agent, which evolves to generate targeted corner test cases that exploit logical divergences in the current code pool. These discovered tests form a dynamic, progressively hostile filter that penalizes fragile reasoning. Extensive experiments demonstrate that AdverMCTS significantly outperforms state-of-the-art baselines, effectively reducing false positive rates and forcing the model to generalize beyond the initial constraints. The resources of this work are available at https://anonymous.4open.science/r/AdverMCTS_open-A255.

Problem

Research questions and friction points this paper is trying to address.

pseudo-correctness

code generation

test case generalization

Large Language Models

robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adversarial Monte Carlo Tree Search

Code Generation

Pseudo-Correctness