Alignment-Aware Decoding

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Alignment of large language models (LLMs) remains a core challenge in NLP. This paper proposes Alignment-Aware Decoding (AAD), the first method to embed alignment optimization directly into the inference stage—without additional training or explicit reward modeling. AAD operates within the standard Direct Preference Optimization (DPO) framework, guiding decoding via implicit reward signals to dynamically adjust the output distribution and improve consistency with human preferences. Its key contributions are twofold: (1) shifting alignment from the training phase to the decoding phase, and (2) enabling self-generation of high-quality preference data from model outputs, thereby alleviating data scarcity in low-resource settings. Experiments demonstrate that AAD significantly outperforms strong baselines across multiple model scales and mainstream alignment benchmarks. Notably, it exhibits exceptional data augmentation and generalization capabilities under data-constrained conditions.

Technology Category

Application Category

📝 Abstract
Alignment of large language models remains a central challenge in natural language processing. Preference optimization has emerged as a popular and effective method for improving alignment, typically through training-time or prompt-based interventions. In this paper, we introduce alignment-aware decoding (AAD), a method to enhance model alignment directly at inference. Theoretically, AAD can be interpreted as implicit reward optimization, yet it requires no specialized training beyond the standard DPO setup. Empirically, AAD consistently outperforms strong baselines across diverse alignment benchmarks and model scales. Moreover, in data-constrained settings, AAD can produce high-quality synthetic data to improve alignment under standard decoding, providing a practical solution when labeled data is limited.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM alignment during inference without training
Addressing data constraints via synthetic data generation
Improving alignment across diverse benchmarks and model scales
Innovation

Methods, ideas, or system contributions that make the work stand out.

Alignment-aware decoding enhances model alignment at inference
Method requires no specialized training beyond standard DPO
Produces synthetic data to improve alignment in limited-data settings
🔎 Similar Papers
No similar papers found.