MixDPO: Modeling Preference Strength for Pluralistic Alignment

📅 2026-01-07

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the limitation of existing preference alignment methods, which typically assume homogeneous preference strength and thus fail to capture the heterogeneity arising from individual and contextual differences in real-world scenarios. To overcome this, we propose MixDPO, the first approach to integrate the mixed logit model from behavioral economics into language model alignment. By extending Direct Preference Optimization, MixDPO explicitly models the distributional variation of preference strength. Grounded in discrete choice theory and deep learning, our method learns the parameters of preference strength distributions directly from preference data, enabling the alignment objective to distinguish between strong and weak preferences. Experiments across three datasets and two open-source models demonstrate that MixDPO improves average alignment performance by 11.2 points on Pythia-2.8B, with particularly pronounced gains in high-heterogeneity settings, while effectively preserving subgroup-specific preferences.

Technology Category

Application Category

📝 Abstract

Preference based alignment objectives implicitly assume that all human preferences are expressed with equal strength. In practice, however, preference strength varies across individuals and contexts -- a phenomenon established in behavioral economics and discrete choice theory. This mismatch limits the ability of existing objectives to faithfully capture heterogeneous human judgments. Inspired by this literature, we introduce Mixed Logit Direct Preference Optimization (MixDPO), a generalization of Direct Preference Optimization that models variation in preference strength. MixDPO enables alignment objectives to capture heterogeneity in how strongly preferences are expressed across training examples. We evaluate MixDPO on three preference datasets using two open-weight language models. Across datasets, MixDPO improves aggregate alignment performance (+11.2 points on Pythia-2.8B) while preserving subgroup level preferences, with the largest gains appearing in settings with higher inferred preference heterogeneity. MixDPO makes preference heterogeneity explicit through learned strength distributions. We release our code for reproducibility.

Problem

Research questions and friction points this paper is trying to address.

preference strength

pluralistic alignment

preference heterogeneity

behavioral economics

discrete choice theory

Innovation

Methods, ideas, or system contributions that make the work stand out.

MixDPO

preference strength

heterogeneous preferences