GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment

📅 2024-10-10
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of efficiently aligning large language models (LLMs) with diverse human preferences during inference—without fine-tuning or parameter updates. We propose a novel real-time alignment paradigm grounded in an autoregressive reward model (ARM), the first of its kind to predict token-level, word-by-word rewards. We theoretically establish its equivalence to trajectory-level reward modeling under KL-regularized reinforcement learning. By integrating test-time decoding interventions and multi-objective preference fusion, our method enables dynamic, low-overhead guidance of strong LLMs using weakly supervised reward models. Experiments demonstrate that our approach significantly outperforms existing test-time alignment baselines, matches the performance of training-time alignment methods, and achieves zero-retraining, flexible control across multiple preference dimensions—while reducing computational overhead substantially.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) exhibit impressive capabilities but require careful alignment with human preferences. Traditional training-time methods finetune LLMs using human preference datasets but incur significant training costs and require repeated training to handle diverse user preferences. Test-time alignment methods address this by using reward models (RMs) to guide frozen LLMs without retraining. However, existing test-time approaches rely on trajectory-level RMs which are designed to evaluate complete responses, making them unsuitable for autoregressive text generation that requires computing next-token rewards from partial responses. To address this, we introduce GenARM, a test-time alignment approach that leverages the Autoregressive Reward Model--a novel reward parametrization designed to predict next-token rewards for efficient and effective autoregressive generation. Theoretically, we demonstrate that this parametrization can provably guide frozen LLMs toward any distribution achievable by traditional RMs within the KL-regularized reinforcement learning framework. Experimental results show that GenARM significantly outperforms prior test-time alignment baselines and matches the performance of training-time methods. Additionally, GenARM enables efficient weak-to-strong guidance, aligning larger LLMs with smaller RMs without the high costs of training larger models. Furthermore, GenARM supports multi-objective alignment, allowing real-time trade-offs between preference dimensions and catering to diverse user preferences without retraining.
Problem

Research questions and friction points this paper is trying to address.

Language Model Guidance
Human-aligned Text Generation
Efficient Generation Process
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoregressive Reward Modeling
Large Language Models Optimization
Cost-Efficient Alignment
🔎 Similar Papers