🤖 AI Summary
Existing work lacks a systematic understanding of the trade-off between accuracy and computational cost in multi-sample aggregation and pruning methods for large language model (LLM) reasoning. This study addresses how to sample efficiently under a fixed budget for process reward evaluation to enhance reasoning performance, framed within the particle filtering (Sequential Monte Carlo, SMC) paradigm. We propose an improved SMC algorithm with non-asymptotic theoretical guarantees, establish fundamental performance limits inherent to all particle filtering approaches, and demonstrate through error analysis that our criteria effectively control sampling error. However, this error control exhibits limited predictive power for final task accuracy, suggesting the need for new theoretical frameworks that go beyond conventional sampling perspectives.
📝 Abstract
Inference-time methods that aggregate and prune multiple samples have emerged as a powerful paradigm for steering large language models, yet we lack any principled understanding of their accuracy-cost tradeoffs. In this paper, we introduce a route to rigorously study such approaches using the lens of *particle filtering* algorithms such as Sequential Monte Carlo (SMC). Given a base language model and a *process reward model* estimating expected terminal rewards, we ask: *how accurately can we sample from a target distribution given some number of process reward evaluations?* Theoretically, we identify (1) simple criteria enabling non-asymptotic guarantees for SMC; (2) algorithmic improvements to SMC; and (3) a fundamental limit faced by all particle filtering methods. Empirically, we demonstrate that our theoretical criteria effectively govern the *sampling error* of SMC, though not necessarily its final *accuracy*, suggesting that theoretical perspectives beyond sampling may be necessary.