One Goal, Many Challenges: Robust Preference Optimization Amid Content-Aware and Multi-Source Noise

📅 2025-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the insufficient robustness of large language model preference alignment caused by content-dependent noise—such as response-length and harmfulness biases—in human feedback. We propose a content-aware, noise-robust preference optimization framework. Methodologically, we (i) formally model and disentangle multiple sources of content-dependent noise for the first time; (ii) unify heterogeneous noise representation and control via a backdoor attack-inspired mechanism; and (iii) design a theoretically grounded multi-objective optimization paradigm that jointly optimizes primary preference alignment and noise suppression. Empirical evaluation on diverse synthetic noisy datasets demonstrates that our method improves primary-task accuracy by 12.7%, while reducing length bias and harmfulness bias by 38.4% and 29.1%, respectively. The framework provides provable convergence guarantees and advances robustness in preference learning under realistic, noisy human feedback.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have made significant strides in generating human-like responses, largely due to preference alignment techniques. However, these methods often assume unbiased human feedback, which is rarely the case in real-world scenarios. This paper introduces Content-Aware Noise-Resilient Preference Optimization (CNRPO), a novel framework that addresses multiple sources of content-dependent noise in preference learning. CNRPO employs a multi-objective optimization approach to separate true preferences from content-aware noises, effectively mitigating their impact. We leverage backdoor attack mechanisms to efficiently learn and control various noise sources within a single model. Theoretical analysis and extensive experiments on different synthetic noisy datasets demonstrate that CNRPO significantly improves alignment with primary human preferences while controlling for secondary noises and biases, such as response length and harmfulness.
Problem

Research questions and friction points this paper is trying to address.

Addresses content-dependent noise in preference learning
Separates true preferences from content-aware noises
Improves alignment with human preferences, controls biases
Innovation

Methods, ideas, or system contributions that make the work stand out.

CNRPO framework mitigates content-aware noise
Multi-objective optimization separates true preferences
Backdoor attack mechanisms control noise sources
🔎 Similar Papers