Geometric-Averaged Preference Optimization for Soft Preference Labels

📅 2024-09-10

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Existing preference optimization methods rely on binary labels, failing to capture the inherent individual variability and gradational ambiguity of human preferences—leading to objective misalignment and overfitting. To address this, we propose a soft preference modeling framework built upon Direct Preference Optimization (DPO). Our method introduces a novel loss function based on output-likelihood-weighted geometric averaging, enabling smooth, intensity-aware decay of the optimization objective. Crucially, we are the first to incorporate distributed soft labels—fine-grained, confidence-calibrated feedback generated by large language models—into preference alignment, achieving plug-and-play integration without architectural modifications. Extensive evaluation on standard alignment benchmarks demonstrates significant improvements in response quality, particularly in scenarios dominated by medium-confidence soft labels. These results empirically validate that soft preference modeling enhances both robustness and generalization in preference-aligned LLMs.

Technology Category

Application Category

📝 Abstract

Many algorithms for aligning LLMs with human preferences assume that human preferences are binary and deterministic. However, human preferences can vary across individuals, and therefore should be represented distributionally. In this work, we introduce the distributional soft preference labels and improve Direct Preference Optimization (DPO) with a weighted geometric average of the LLM output likelihood in the loss function. This approach adjusts the scale of learning loss based on the soft labels such that the loss would approach zero when the responses are closer to equally preferred. This simple modification can be easily applied to any DPO-based methods and mitigate over-optimization and objective mismatch, which prior works suffer from. Our experiments simulate the soft preference labels with AI feedback from LLMs and demonstrate that geometric averaging consistently improves performance on standard benchmarks for alignment research. In particular, we observe more preferable responses than binary labels and significant improvements where modestly-confident labels are in the majority.

Problem

Research questions and friction points this paper is trying to address.

Complex Preferences

Overfitting

Inconsistent Objectives

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed Soft Preference Labels

Geometric Mean Optimization

Direct Preference Optimization (DPO)

🔎 Similar Papers

Vague Preference Policy Learning for Conversational Recommendation