Personalized LLM Decoding via Contrasting Personal Preference

📅 2025-06-13

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Current personalized text generation with large language models (LLMs) relies heavily on prompt engineering or full-parameter fine-tuning, both of which suffer from inflexibility, high computational cost, or dependency on labeled data. To address this, we propose CoPe—a lightweight, decoding-phase personalization method built upon parameter-efficient fine-tuning (PEFT). CoPe is the first to integrate contrastive preference modeling into unsupervised decoding, implicitly guiding generation via user-specific reward signals—without requiring additional training, external reward models, or human annotations. Its core innovation lies in enabling real-time, training-free, prompt-free, and fine-tuning-free personalized control. Evaluated on five open-ended generation tasks, CoPe achieves an average 10.57% improvement in ROUGE-L, significantly enhancing personalization expressiveness while preserving generation quality and inference efficiency.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) are progressively deployed in various real-world applications, personalization of LLMs has become increasingly important. While various approaches to LLM personalization such as prompt-based and training-based methods have been actively explored, the development of effective decoding-time algorithms remains largely overlooked, despite their demonstrated potential. In this paper, we propose CoPe (Contrasting Personal Preference), a novel decoding-time approach applied after performing parameter-efficient fine-tuning (PEFT) on user-specific data. Our core idea is to leverage reward-guided decoding specifically for personalization by maximizing each user's implicit reward signal. We evaluate CoPe across five open-ended personalized text generation tasks. Our empirical results demonstrate that CoPe achieves strong performance, improving personalization by an average of 10.57% in ROUGE-L, without relying on external reward models or additional training procedures.

Problem

Research questions and friction points this paper is trying to address.

Develop decoding-time algorithms for personalized LLMs

Maximize user-specific implicit reward signals

Improve personalization without external reward models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrasting Personal Preference for decoding

Reward-guided decoding without external models

Parameter-efficient fine-tuning for personalization

🔎 Similar Papers

PAD: Personalized Alignment of LLMs at Decoding-Time

2024-10-05Citations: 4

Comparing Retrieval-Augmentation and Parameter-Efficient Fine-Tuning for Privacy-Preserving Personalization of Large Language Models

2024-09-14arXiv.orgCitations: 10

OpenAI

$380K – $445K • Offers Equity

San Francisco, CA, USA

Research Engineer, Language - Personalization, Meta Superintelligence Labs