Well Begun is Half Done: Low-resource Preference Alignment by Weak-to-Strong Decoding

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) struggle to simultaneously achieve high alignment quality and strong generative capability under low-resource constraints. Method: This paper proposes the “Weak-to-Strong Decoding” (WSD) framework, which identifies—empirically for the first time—that alignment difficulties concentrate in the early decoding phase. WSD thus introduces a two-stage collaborative paradigm: a lightweight Pilot-3B draft model generates highly aligned initial tokens, which are then extended by a large base model; an automatic switching mechanism dynamically coordinates the transition. To enable this, we construct the GenerAlign dataset and fine-tune Pilot-3B to eliminate the “alignment tax.” Contribution/Results: Experiments demonstrate that WSD significantly outperforms low-resource baselines across multiple preference-alignment benchmarks, substantially improving response alignment while preserving—and even enhancing—general NLU/NLG capabilities.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) require alignment with human preferences to avoid generating offensive, false, or meaningless content. Recently, low-resource methods for LLM alignment have been popular, while still facing challenges in obtaining both high-quality and aligned content. Motivated by the observation that the difficulty of generating aligned responses is concentrated at the beginning of decoding, we propose a novel framework, Weak-to-Strong Decoding (WSD), to enhance the alignment ability of base models by the guidance of a small aligned model. The small model first drafts well-aligned beginnings, followed by the large base model to continue the rest, controlled by a well-designed auto-switch mechanism. We also collect a new dataset, GenerAlign, to fine-tune a small-sized Pilot-3B as the draft model, which effectively enhances different base models under the WSD framework to outperform all baseline methods, while avoiding degradation on downstream tasks, termed as the alignment tax. Extensive experiments are further conducted to examine the impact of different settings and time efficiency, as well as analyses on the intrinsic mechanisms of WSD in depth.
Problem

Research questions and friction points this paper is trying to address.

Aligning LLMs with human preferences using low-resource methods
Generating high-quality and aligned content simultaneously
Avoiding alignment tax on downstream tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Weak-to-Strong Decoding for alignment
Auto-switch mechanism controls generation
Pilot-3B draft model enhances performance
🔎 Similar Papers
No similar papers found.
Feifan Song
Feifan Song
Peking University
Natural Language Processing
S
Shaohang Wei
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
Wen Luo
Wen Luo
Peking University
Yuxuan Fan
Yuxuan Fan
Peking University
Natural Language Processing
T
Tianyu Liu
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
G
Guoyin Wang
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
H
Houfeng Wang
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University