ExPairT-LLM: Exact Learning for LLM Code Selection by Pairwise Queries

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing code selection methods rely on large language models (LLMs) to independently assess individual programs, rendering them vulnerable to LLM output errors or misjudgments of functional equivalence. This work proposes a pairwise query framework grounded in exact learning, introducing two novel query types—pairwise membership queries and pairwise equivalence queries—to construct a robust tournament-based selection algorithm. Unlike prior approaches, it does not assume LLM reliability; instead, it enhances selection accuracy through adversarial pairwise program comparisons. Evaluated on four mainstream code generation benchmarks, the method achieves an average 13.0% absolute improvement in pass@1, with gains up to 27.1%. On complex reasoning tasks, it boosts success rate by 24.0%, substantially outperforming state-of-the-art methods. The framework thus provides a principled, robust alternative to independent LLM judgments for program selection.

Technology Category

Application Category

📝 Abstract
Despite recent advances in LLMs, the task of code generation is still challenging. To cope, code selection algorithms select the best program from multiple programs generated by an LLM. However, existing algorithms can fail to identify the correct program, either because they can misidentify nonequivalent programs or because they rely on an LLM and assume it always correctly determines the output for every input. We present ExPairT-LLM, an exact learning algorithm for code selection that selects a program by posing to an LLM oracle two new types of queries: pairwise membership and pairwise equivalence. These queries are simpler for LLMs and enable ExPairT-LLM to identify the correct program through a tournament, which is robust to some LLM mistakes. We evaluate ExPairT-LLM on four popular code datasets. Its pass@1 (success rate) outperforms the state-of-the-art code selection algorithm on average by +13.0% and up to +27.1%. It also improves the pass@1 of LLMs performing complex reasoning by +24.0%.
Problem

Research questions and friction points this paper is trying to address.

Selecting the best program from multiple LLM-generated codes
Improving accuracy of code selection by using pairwise queries
Enhancing LLM code generation success rates through exact learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses pairwise membership queries for LLM
Employs pairwise equivalence queries for LLM
Implements tournament selection robust to LLM errors
🔎 Similar Papers
No similar papers found.