Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

πŸ“… 2024-07-10
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 12
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses two prevalent noise types in Direct Preference Optimization (DPO): low-quality individual samples (point-wise noise) and incorrect preference pairs (pair-wise noise). We propose Dr. DPOβ€”the first robust DPO framework with both theoretical guarantees and practical efficacy. We theoretically establish that standard DPO inherently possesses distributional robustness, and leverage this insight to design an explicit robust objective. Our method introduces a hyperparameter Ξ²β€² to balance exploration and exploitation, and employs adversarial worst-case distribution optimization to achieve strong robustness against pair-wise noise. Theoretical analysis proves convergence and quantifies robustness bounds. Empirical results demonstrate that Dr. DPO consistently improves generation quality and response accuracy on both noisy and clean preference datasets. The implementation is publicly available.

Technology Category

Application Category

πŸ“ Abstract
This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations that affect preference rankings. Utilizing Distributionally Robust Optimization (DRO), we enhance DPO's resilience to these types of noise. Our theoretical insights reveal that DPO inherently embeds DRO principles, conferring robustness to pointwise noise, with the regularization coefficient $eta$ playing a critical role in its noise resistance. Extending this framework, we introduce Distributionally Robustifying DPO (Dr. DPO), which integrates pairwise robustness by optimizing against worst-case pairwise scenarios. The novel hyperparameter $eta'$ in Dr. DPO allows for fine-tuned control over data pair reliability, providing a strategic balance between exploration and exploitation in noisy training environments. Empirical evaluations demonstrate that Dr. DPO substantially improves the quality of generated text and response accuracy in preference datasets, showcasing enhanced performance in both noisy and noise-free settings. The code is available at https://github.com/junkangwu/Dr_DPO.
Problem

Research questions and friction points this paper is trying to address.

Enhancing DPO robustness to noisy training data
Addressing pointwise and pairwise noise in LLM alignment
Optimizing worst-case scenarios for reliable preference learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhances DPO with Distributionally Robust Optimization
Introduces Dr. DPO for pairwise noise robustness
Uses novel hyperparameters to balance reliability
πŸ”Ž Similar Papers
No similar papers found.