DuET: Dual Execution for Test Output Prediction with Generated Code and Pseudocode

📅 2026-04-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
This work addresses the reliability challenges in large language models (LLMs) for test-time output prediction, where errors often stem from code execution failures or hallucinated pseudocode. To mitigate these issues, the authors propose DuET, a dual-execution framework that uniquely integrates direct execution of generated code with LLM-based simulated execution of pseudocode. By reconciling the outputs of both execution paths through a functional majority voting mechanism, DuET effectively compensates for the limitations inherent in either approach alone. Evaluated on the LiveCodeBench benchmark, the method demonstrates substantial performance gains, achieving a 13.6 percentage point improvement in Pass@1 over current state-of-the-art techniques, thereby exhibiting enhanced robustness and prediction accuracy.

Technology Category

Application Category

📝 Abstract
This work addresses test output prediction, a key challenge in test case generation. To improve the reliability of predicted outputs by LLMs, prior approaches generate code first to ground predictions. One grounding strategy is direct execution of generated code, but even minor errors can cause failures. To address this, we introduce LLM-based pseudocode execution, which grounds prediction on more error-resilient pseudocode and simulates execution via LLM reasoning. We further propose DuET, a dual-execution framework that combines both approaches by functional majority voting. Our analysis shows the two approaches are complementary in overcoming the limitations of direct execution suffering from code errors, and pseudocode reasoning from hallucination. On LiveCodeBench, DuET achieves the state-of-the-art performance, improving Pass@1 by 13.6 pp.
Problem

Research questions and friction points this paper is trying to address.

test output prediction
code generation
pseudocode execution
LLM reliability
test case generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

dual execution
test output prediction
pseudocode execution
LLM reasoning
functional majority voting