🤖 AI Summary
Preference tuning often suffers significant performance and utility degradation in cross-domain scenarios due to distributional shift. This work systematically investigates the generalization capabilities of various preference alignment objectives under domain transfer and, for the first time, compares multiple adaptation strategies on summarization and question-answering tasks. Through the incorporation of pseudo-labeling, supervised fine-tuning, and multi-objective alignment, our experiments demonstrate that pseudo-labeling effectively mitigates performance deterioration caused by domain shift and substantially enhances cross-domain generalization. This study provides both empirical foundations and practical solutions for robust domain transfer in preference-based fine-tuning.
📝 Abstract
Preference tuning aligns pretrained language models to human judgments of quality, helpfulness, or safety by optimizing over explicit preference signals rather than likelihood alone. Prior work has shown that preference-tuning degrades performance and reduces helpfulness when evaluated outside the training domain. However, the extent to which adaptation strategies mitigate this domain shift remains unexplored. We address this challenge by conducting a comprehensive and systematic study of alignment generalization under domain shift. We compare five popular alignment objectives and various adaptation strategies from source to target, including target-domain supervised fine-tuning and pseudo-labeling, across summarization and question-answering helpfulness tasks. Our findings reveal systematic differences in generalization across alignment objectives under domain shift. We show that adaptation strategies based on pseudo-labeling can substantially reduce domain-shift degradation