Geometric-Averaged Preference Optimization for Soft Preference Labels

📅 2024-09-10
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Existing preference optimization methods rely on binary labels, failing to capture the inherent individual variability and gradational ambiguity of human preferences—leading to objective misalignment and overfitting. To address this, we propose a soft preference modeling framework built upon Direct Preference Optimization (DPO). Our method introduces a novel loss function based on output-likelihood-weighted geometric averaging, enabling smooth, intensity-aware decay of the optimization objective. Crucially, we are the first to incorporate distributed soft labels—fine-grained, confidence-calibrated feedback generated by large language models—into preference alignment, achieving plug-and-play integration without architectural modifications. Extensive evaluation on standard alignment benchmarks demonstrates significant improvements in response quality, particularly in scenarios dominated by medium-confidence soft labels. These results empirically validate that soft preference modeling enhances both robustness and generalization in preference-aligned LLMs.

Technology Category

Application Category

📝 Abstract
Many algorithms for aligning LLMs with human preferences assume that human preferences are binary and deterministic. However, human preferences can vary across individuals, and therefore should be represented distributionally. In this work, we introduce the distributional soft preference labels and improve Direct Preference Optimization (DPO) with a weighted geometric average of the LLM output likelihood in the loss function. This approach adjusts the scale of learning loss based on the soft labels such that the loss would approach zero when the responses are closer to equally preferred. This simple modification can be easily applied to any DPO-based methods and mitigate over-optimization and objective mismatch, which prior works suffer from. Our experiments simulate the soft preference labels with AI feedback from LLMs and demonstrate that geometric averaging consistently improves performance on standard benchmarks for alignment research. In particular, we observe more preferable responses than binary labels and significant improvements where modestly-confident labels are in the majority.
Problem

Research questions and friction points this paper is trying to address.

Complex Preferences
Overfitting
Inconsistent Objectives
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed Soft Preference Labels
Geometric Mean Optimization
Direct Preference Optimization (DPO)
🔎 Similar Papers
No similar papers found.