The Model Knows, the Decoder Finds: Future Value Guided Particle Power Sampling

📅 2026-05-04

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

While foundational large language models inherently encode correct multi-step reasoning paths, existing decoding methods struggle to efficiently uncover them. This work proposes Auxiliary Particle Power Sampling (APPS), a novel decoding framework that maintains a set of partial solution candidates through block-wise parallel particle filtering and incorporates a future-value-guided reweighting and resampling mechanism to approximate the sequence-level power posterior distribution—without requiring any additional training. APPS directly controls computational cost via the number of particles and supports lightweight, learnable value heads as an alternative to myopic backtracking. Experiments demonstrate that APPS significantly improves the trade-off between accuracy and efficiency across multiple reasoning benchmarks, substantially narrowing the performance gap between training-free decoding and post-trained systems.

📝 Abstract

A recurring pattern in "reasoning without training" is that base LLMs already assign non-trivial probability mass to correct multi-step solutions; the bottleneck is locating these modes efficiently at inference time. Power sampling provides a principled way to bias decoding toward such modes by targeting p_theta(x)^alpha with alpha > 1, but practical approximations must account for future-dependent correction factors that determine which prefixes remain promising. We introduce Auxiliary Particle Power Sampling (APPS), a blockwise particle algorithm for approximating the sequence-level power target with a bounded population of partial solutions. APPS propagates hypotheses in parallel using proposal-corrected power reweighting and refines their survival through future-value-guided selection at resampling boundaries. This redistributes finite compute across competing prefixes rather than committing to a single unfolding path, while providing a direct scaling knob in the particle count and predictable peak memory. We instantiate the future-value signal with short-horizon rollouts and also study an amortized variant that replaces rollouts with a lightweight learned selection head. Across reasoning benchmarks, APPS improves the accuracy-runtime trade-off of training-free decoding and suggests that part of the gap to post-trained systems can be recovered through more faithful inference-time power approximation.

Problem

Research questions and friction points this paper is trying to address.

reasoning without training

power sampling

inference-time decoding

future-value guidance

particle sampling

Innovation

Methods, ideas, or system contributions that make the work stand out.

power sampling

particle filtering

future-value guidance