How Well Can Preference Optimization Generalize Under Noisy Feedback?

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the generalization performance of preference optimization under noisy human feedback—arising from mislabeling and judgment uncertainty. We develop the first theoretical analysis framework for finite-step optimization of mainstream loss functions—including DPO, IPO, and SLiC—integrating explicit noise modeling with statistical learning theory to characterize generalization error decay under varying noise types and rates. Systematic experiments on real large language models validate our theoretical findings: noise substantially degrades generalization, with its impact growing nonlinearly in the noise rate. Crucially, we provide the first quantitative characterization of the “breakdown threshold” of noise in preference learning—i.e., the critical noise level beyond which reliable alignment deteriorates. This yields both theoretical foundations and practical guidance for designing robust human preference alignment algorithms.

Technology Category

Application Category

📝 Abstract
As large language models (LLMs) advance their capabilities, aligning these models with human preferences has become crucial. Preference optimization, which trains models to distinguish between preferred and non-preferred responses based on human feedback, has become a crucial component for aligning LLMs. However, most existing works assume noise-free feedback, which is unrealistic due to the inherent errors and inconsistencies in human judgments. This paper addresses the impact of noisy feedback on preference optimization, providing generalization guarantees under these conditions. In particular, we consider noise models that correspond to common real-world sources of noise, such as mislabeling and uncertainty. Unlike traditional analyses that assume convergence, our work focuses on finite-step preference optimization, offering new insights that are more aligned with practical LLM training. We describe how generalization decays with different types of noise across levels of noise rates based on the preference data distribution and number of samples. Our analysis for noisy preference learning applies to a broad family of preference optimization losses such as DPO, IPO, SLiC, etc. Empirical validation on contemporary LLMs confirms the practical relevance of our findings, offering valuable insights for developing AI systems that align with human preferences.
Problem

Research questions and friction points this paper is trying to address.

Analyzing how noisy human feedback affects preference optimization generalization in LLMs
Studying generalization decay under different noise types and rates in preference data
Providing finite-step optimization guarantees for alignment methods like DPO and IPO
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes noisy feedback impact on preference optimization
Provides generalization guarantees under finite-step training
Applies to broad family of preference optimization losses
🔎 Similar Papers
No similar papers found.
Shawn Im
Shawn Im
University of Wisconsin-Madison
Y
Yixuan Li
Department of Computer Sciences, University of Wisconsin-Madison