Adaptive Blockwise Search: Inference-Time Alignment for Large Language Models

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Large language models (LLMs) suffer from limited alignment quality during inference due to fixed computational budget allocation. Method: Motivated by the hypothesis that initial response tokens are disproportionately critical for alignment, we propose AdaSearch—a runtime adaptive chunked search strategy—and extend it into AdaBeam, a tree-based search framework. AdaBeam dynamically allocates computation to early tokens via sampling scheduling, integrates sequence-level decoding optimization, and employs Best-of-N comparative evaluation. Contribution/Results: Evaluated across eight mainstream LLMs, AdaBeam achieves over 10% higher win rates than Best-of-N baselines on harmlessness generation, sentiment control, and mathematical reasoning tasks. It is the first work to systematically identify and exploit the dominant alignment influence of initial tokens, thereby significantly improving both inference efficiency and alignment fidelity.

Technology Category

Application Category

📝 Abstract

LLM alignment remains a critical challenge. Inference-time methods provide a flexible alternative to fine-tuning, but their uniform computational effort often yields suboptimal alignment. We hypothesize that for many alignment tasks, the initial tokens of a response are disproportionately more critical. To leverage this principle, we introduce AdaSearch, a novel blockwise search strategy. It adaptively allocates a fixed computational budget using a sampling schedule, focusing search effort on these critical tokens. We apply AdaSearch to sequential decoding and introduce its tree-search counterpart, AdaBeam. Our comprehensive evaluation across eight LLMs demonstrates that AdaSearch outperforms strong Best-of-N and fine-tuning baselines. Specifically, win-rates improve by over 10% for harmlessness generation, controlled sentiment generation, and for mathematical reasoning tasks relative to Best-of-N.

Problem

Research questions and friction points this paper is trying to address.

Optimizes computational budget allocation during inference

Enhances alignment by prioritizing critical initial tokens

Improves performance in harmlessness and reasoning tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive blockwise search allocates computational budget

Focuses search effort on critical initial tokens

Outperforms baselines in harmlessness and reasoning tasks

🔎 Similar Papers

No similar papers found.