Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Traditional self-consistency methods suffer from computational inefficiency in constrained domains such as mathematics and code generation due to repeated sampling of high-probability prefixes and redundant completions. This work proposes Deterministic Leaf Enumeration (DLE), a novel approach that introduces deterministic tree traversal into test-time inference. By systematically pruning the decoding tree and enumerating distinct leaf nodes while reusing shared prefixes, DLE eliminates redundant generation and substantially improves both search space coverage and reasoning quality. Under identical computational budgets, DLE consistently outperforms stochastic self-consistency across mathematical reasoning, programming, and general-purpose reasoning tasks.

Technology Category

Application Category

📝 Abstract

Self-consistency boosts inference-time performance by sampling multiple reasoning traces in parallel and voting. However, in constrained domains like math and code, this strategy is compute-inefficient because it samples with replacement, repeatedly revisiting the same high-probability prefixes and duplicate completions. We propose Distinct Leaf Enumeration (DLE), a deterministic decoding method that treats truncated sampling as traversal of a pruned decoding tree and systematically enumerates distinct leaves instead of sampling with replacement. This strategy improves inference efficiency in two ways. Algorithmically, it increases coverage of the truncated search space under a fixed budget by exploring previously unvisited high-probability branches. Systemically, it reuses shared prefixes and reduces redundant token generation. Empirically, DLE explores higher-quality reasoning traces than stochastic self-consistency, yielding better performance on math, coding, and general reasoning tasks.

Problem

Research questions and friction points this paper is trying to address.

test-time inference

self-consistency

decoding efficiency

reasoning traces

constrained domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distinct Leaf Enumeration

deterministic decoding

truncated decoding tree