CoPL: Collaborative Preference Learning for Personalizing LLMs

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM personalization methods suffer from limited flexibility and generalizability in modeling user preferences. To address this, we propose a graph-augmented collaborative preference learning framework: (1) constructing a user–response bipartite graph to capture cross-user preference correlations via graph neural networks and collaborative filtering; (2) introducing a novel LoRA-based mixture-of-experts architecture that jointly learns shared preference representations and user-specific adaptations; and (3) incorporating an optimization-free adaptation mechanism enabling zero-shot transfer. Evaluated on UltraFeedback-P, our method significantly outperforms existing personalized reward models—accurately distinguishing consensus versus contentious preferences, enhancing robustness of preference estimation under sparse annotations, and maintaining high efficiency and scalability.

Technology Category

Application Category

📝 Abstract
Personalizing large language models (LLMs) is important for aligning outputs with diverse user preferences, yet existing methods struggle with flexibility and generalization. We propose CoPL (Collaborative Preference Learning), a graph-based collaborative filtering framework that models user-response relationships to enhance preference estimation, particularly in sparse annotation settings. By integrating a mixture of LoRA experts, CoPL efficiently fine-tunes LLMs while dynamically balancing shared and user-specific preferences. Additionally, an optimization-free adaptation strategy enables generalization to unseen users without fine-tuning. Experiments on UltraFeedback-P demonstrate that CoPL outperforms existing personalized reward models, effectively capturing both common and controversial preferences, making it a scalable solution for personalized LLM alignment.
Problem

Research questions and friction points this paper is trying to address.

Personalizing LLMs to align with diverse user preferences
Enhancing preference estimation in sparse annotation settings
Generalizing to unseen users without fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based collaborative filtering for preference estimation
Mixture of LoRA experts for efficient LLM fine-tuning
Optimization-free adaptation for generalization to unseen users
🔎 Similar Papers
No similar papers found.
Youngbin Choi
Youngbin Choi
Pohang university of science and technology
Machine learning
Seunghyuk Cho
Seunghyuk Cho
POSTECH
Generative modelHyperbolic spaceVAECrowdsourcing
Minjong Lee
Minjong Lee
POSTECH
Machine Learning
M
M. Park
Graduate School of Artificial Intelligence, POSTECH
Y
Yesong Ko
Department of Computer Science and Engineering, POSTECH
Jungseul Ok
Jungseul Ok
Associate Professor, CSE/AI, POSTECH
Reinforcement LearningMachine Learning
D
Dongwoo Kim
Graduate School of Artificial Intelligence, POSTECH; Department of Computer Science and Engineering, POSTECH