🤖 AI Summary
Large language models (LLMs) suffer from limited alignment quality during inference due to fixed computational budget allocation. Method: Motivated by the hypothesis that initial response tokens are disproportionately critical for alignment, we propose AdaSearch—a runtime adaptive chunked search strategy—and extend it into AdaBeam, a tree-based search framework. AdaBeam dynamically allocates computation to early tokens via sampling scheduling, integrates sequence-level decoding optimization, and employs Best-of-N comparative evaluation. Contribution/Results: Evaluated across eight mainstream LLMs, AdaBeam achieves over 10% higher win rates than Best-of-N baselines on harmlessness generation, sentiment control, and mathematical reasoning tasks. It is the first work to systematically identify and exploit the dominant alignment influence of initial tokens, thereby significantly improving both inference efficiency and alignment fidelity.
📝 Abstract
LLM alignment remains a critical challenge. Inference-time methods provide a flexible alternative to fine-tuning, but their uniform computational effort often yields suboptimal alignment. We hypothesize that for many alignment tasks, the initial tokens of a response are disproportionately more critical. To leverage this principle, we introduce AdaSearch, a novel blockwise search strategy. It adaptively allocates a fixed computational budget using a sampling schedule, focusing search effort on these critical tokens. We apply AdaSearch to sequential decoding and introduce its tree-search counterpart, AdaBeam. Our comprehensive evaluation across eight LLMs demonstrates that AdaSearch outperforms strong Best-of-N and fine-tuning baselines. Specifically, win-rates improve by over 10% for harmlessness generation, controlled sentiment generation, and for mathematical reasoning tasks relative to Best-of-N.