🤖 AI Summary
Traditional self-consistency methods suffer from computational inefficiency in constrained domains such as mathematics and code generation due to repeated sampling of high-probability prefixes and redundant completions. This work proposes Deterministic Leaf Enumeration (DLE), a novel approach that introduces deterministic tree traversal into test-time inference. By systematically pruning the decoding tree and enumerating distinct leaf nodes while reusing shared prefixes, DLE eliminates redundant generation and substantially improves both search space coverage and reasoning quality. Under identical computational budgets, DLE consistently outperforms stochastic self-consistency across mathematical reasoning, programming, and general-purpose reasoning tasks.
📝 Abstract
Self-consistency boosts inference-time performance by sampling multiple reasoning traces in parallel and voting. However, in constrained domains like math and code, this strategy is compute-inefficient because it samples with replacement, repeatedly revisiting the same high-probability prefixes and duplicate completions. We propose Distinct Leaf Enumeration (DLE), a deterministic decoding method that treats truncated sampling as traversal of a pruned decoding tree and systematically enumerates distinct leaves instead of sampling with replacement. This strategy improves inference efficiency in two ways. Algorithmically, it increases coverage of the truncated search space under a fixed budget by exploring previously unvisited high-probability branches. Systemically, it reuses shared prefixes and reduces redundant token generation. Empirically, DLE explores higher-quality reasoning traces than stochastic self-consistency, yielding better performance on math, coding, and general reasoning tasks.