Inference-Time Code Selection via Symbolic Equivalence Partitioning

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses the limitations of existing Best-of-N code generation approaches, which rely on costly or stochastic external verifiers to identify correct solutions. The authors propose a symbolic equivalence partitioning framework that leverages symbolic execution to group candidate programs by semantic behavior and integrates SMT-encoded domain constraints to refine these partitions. By selecting representative programs from dominant functional clusters, the method improves selection accuracy without requiring additional LLM inference, while simultaneously mitigating path explosion and restricting the search over invalid inputs. Experimental results demonstrate consistent gains: on HumanEval+, average pass@10 accuracy improves from 0.728 to 0.803, and on LiveCodeBench, it rises from 0.516 to 0.604.

Technology Category

Application Category

📝 Abstract

"Best-of-N" selection is a popular inference-time scaling method for code generation using Large Language Models (LLMs). However, to reliably identify correct solutions, existing methods often depend on expensive or stochastic external verifiers. In this paper, we propose Symbolic Equivalence Partitioning, a selection framework that uses symbolic execution to group candidate programs by semantic behavior and select a representative from the dominant functional partition. To improve grouping and selection, we encode domain-specific constraints as Satisfiability Modulo Theories (SMT) assumptions during symbolic execution to reduce path explosion and prevent invalid input searches outside the problem domain. At N=10, our method improves average accuracy over Pass@1 from 0.728 to 0.803 on HumanEval+ and from 0.516 to 0.604 on LiveCodeBench, without requiring any additional LLM inference beyond the initial N candidate generations.

Problem

Research questions and friction points this paper is trying to address.

code generation

inference-time selection

program verification

Large Language Models

correctness identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Symbolic Execution

Equivalence Partitioning

SMT Constraints