Adaptive Blockwise Search: Inference-Time Alignment for Large Language Models

📅 2025-10-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from limited alignment quality during inference due to fixed computational budget allocation. Method: Motivated by the hypothesis that initial response tokens are disproportionately critical for alignment, we propose AdaSearch—a runtime adaptive chunked search strategy—and extend it into AdaBeam, a tree-based search framework. AdaBeam dynamically allocates computation to early tokens via sampling scheduling, integrates sequence-level decoding optimization, and employs Best-of-N comparative evaluation. Contribution/Results: Evaluated across eight mainstream LLMs, AdaBeam achieves over 10% higher win rates than Best-of-N baselines on harmlessness generation, sentiment control, and mathematical reasoning tasks. It is the first work to systematically identify and exploit the dominant alignment influence of initial tokens, thereby significantly improving both inference efficiency and alignment fidelity.

Technology Category

Application Category

📝 Abstract
LLM alignment remains a critical challenge. Inference-time methods provide a flexible alternative to fine-tuning, but their uniform computational effort often yields suboptimal alignment. We hypothesize that for many alignment tasks, the initial tokens of a response are disproportionately more critical. To leverage this principle, we introduce AdaSearch, a novel blockwise search strategy. It adaptively allocates a fixed computational budget using a sampling schedule, focusing search effort on these critical tokens. We apply AdaSearch to sequential decoding and introduce its tree-search counterpart, AdaBeam. Our comprehensive evaluation across eight LLMs demonstrates that AdaSearch outperforms strong Best-of-N and fine-tuning baselines. Specifically, win-rates improve by over 10% for harmlessness generation, controlled sentiment generation, and for mathematical reasoning tasks relative to Best-of-N.
Problem

Research questions and friction points this paper is trying to address.

Optimizes computational budget allocation during inference
Enhances alignment by prioritizing critical initial tokens
Improves performance in harmlessness and reasoning tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive blockwise search allocates computational budget
Focuses search effort on critical initial tokens
Outperforms baselines in harmlessness and reasoning tasks
🔎 Similar Papers
No similar papers found.
M
Mohammad Atif Quamar
Independent Researcher
M
Mohammad Areeb
Purdue University
N
Nishant Sharma
Independent Researcher
A
Ananth Shreekumar
Purdue University
J
Jonathan Rosenthal
Purdue University
Muslum Ozgur Ozmen
Muslum Ozgur Ozmen
Arizona State University
Mikhail Kuznetsov
Mikhail Kuznetsov
AWS
Machine Learning
Z. Berkay Celik
Z. Berkay Celik
Associate Professor of Computer Science, Purdue University
Security and PrivacySystems SecurityCyber-Physical Systems Security