MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Reward modeling is a critical component of RLHF for aligning large language models (LLMs), yet the classical Bradley–Terry (BT) model assumes a single global reward function, failing to capture the inherent heterogeneity and diversity of human preferences—leading to irreducible error. To address this, we propose a two-stage personalized reward modeling framework. In the first stage, an implicit mixture probability model coupled with a context encoder identifies latent preference subgroups without requiring fine-grained annotations. In the second stage, a dynamic gating routing mechanism online adjusts subgroup weights to enable context-aware personalization. Our approach is the first to jointly integrate implicit mixture modeling with context-driven online routing. Experiments across multiple datasets demonstrate significant improvements in personalized ranking accuracy and substantial reduction in irreducible error. This work establishes a scalable, high-fidelity reward modeling paradigm for safe and pluralistic LLM alignment.

Technology Category

Application Category

📝 Abstract
Reward modeling is a key step in building safe foundation models when applying reinforcement learning from human feedback (RLHF) to align Large Language Models (LLMs). However, reward modeling based on the Bradley-Terry (BT) model assumes a global reward function, failing to capture the inherently diverse and heterogeneous human preferences. Hence, such oversimplification limits LLMs from supporting personalization and pluralistic alignment. Theoretically, we show that when human preferences follow a mixture distribution of diverse subgroups, a single BT model has an irreducible error. While existing solutions, such as multi-objective learning with fine-grained annotations, help address this issue, they are costly and constrained by predefined attributes, failing to fully capture the richness of human values. In this work, we introduce MiCRo, a two-stage framework that enhances personalized preference learning by leveraging large-scale binary preference datasets without requiring explicit fine-grained annotations. In the first stage, MiCRo introduces context-aware mixture modeling approach to capture diverse human preferences. In the second stage, MiCRo integrates an online routing strategy that dynamically adapts mixture weights based on specific context to resolve ambiguity, allowing for efficient and scalable preference adaptation with minimal additional supervision. Experiments on multiple preference datasets demonstrate that MiCRo effectively captures diverse human preferences and significantly improves downstream personalization.
Problem

Research questions and friction points this paper is trying to address.

Capturing diverse human preferences in reward modeling
Overcoming limitations of global reward functions in RLHF
Enhancing personalization without fine-grained annotations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Context-aware mixture modeling for diverse preferences
Online routing strategy for dynamic weight adaptation
Two-stage framework for personalized preference learning
🔎 Similar Papers
No similar papers found.
Jingyan Shen
Jingyan Shen
New York University
Jiarui Yao
Jiarui Yao
CS, UIUC
Reinforcement LearningMachine LearningLarge Language Models
R
Rui Yang
University of Illinois at Urbana-Champaign
Y
Yifan Sun
University of Illinois at Urbana-Champaign
F
Feng Luo
Rice University
R
Rui Pan
University of Illinois at Urbana-Champaign
T
Tong Zhang
University of Illinois at Urbana-Champaign
H
Han Zhao
University of Illinois at Urbana-Champaign