CoPL: Collaborative Preference Learning for Personalizing LLMs

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Existing LLM personalization methods suffer from limited flexibility and generalizability in modeling user preferences. To address this, we propose a graph-augmented collaborative preference learning framework: (1) constructing a user–response bipartite graph to capture cross-user preference correlations via graph neural networks and collaborative filtering; (2) introducing a novel LoRA-based mixture-of-experts architecture that jointly learns shared preference representations and user-specific adaptations; and (3) incorporating an optimization-free adaptation mechanism enabling zero-shot transfer. Evaluated on UltraFeedback-P, our method significantly outperforms existing personalized reward models—accurately distinguishing consensus versus contentious preferences, enhancing robustness of preference estimation under sparse annotations, and maintaining high efficiency and scalability.

Technology Category

Application Category

📝 Abstract

Personalizing large language models (LLMs) is important for aligning outputs with diverse user preferences, yet existing methods struggle with flexibility and generalization. We propose CoPL (Collaborative Preference Learning), a graph-based collaborative filtering framework that models user-response relationships to enhance preference estimation, particularly in sparse annotation settings. By integrating a mixture of LoRA experts, CoPL efficiently fine-tunes LLMs while dynamically balancing shared and user-specific preferences. Additionally, an optimization-free adaptation strategy enables generalization to unseen users without fine-tuning. Experiments on UltraFeedback-P demonstrate that CoPL outperforms existing personalized reward models, effectively capturing both common and controversial preferences, making it a scalable solution for personalized LLM alignment.

Problem

Research questions and friction points this paper is trying to address.

Personalizing LLMs to align with diverse user preferences

Enhancing preference estimation in sparse annotation settings

Generalizing to unseen users without fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based collaborative filtering for preference estimation

Mixture of LoRA experts for efficient LLM fine-tuning

Optimization-free adaptation for generalization to unseen users

🔎 Similar Papers

No similar papers found.

Google

$207,000-$300,000 + bonus + equity + benefits.

Mountain View, CA, USA

LLM Post-Training Engineer, Research & Product

TikTok

San Jose, California

Research Engineer, Language - Personalization, Meta Superintelligence Labs