Drift: Decoding-time Personalized Alignments with Implicit User Preferences

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

To address the high annotation and computational costs of conventional LLM personalization via supervised fine-tuning or RLHF, this paper proposes a zero-training, decoding-time preference alignment framework. It decomposes users’ implicit preferences into interpretable, predefined attribute combinations and enables lightweight, dynamic, gradient-free personalization via soft alignment during decoding—without updating any model parameters. Leveraging only 50–100 samples of implicit user feedback (e.g., clicks or dwell time), the method achieves effective output customization. On the Perspective and PRISM benchmarks, it significantly outperforms RLHF baselines while reducing computational overhead by over 90%. Its core contribution is the first decoding-time attribute-based preference modeling paradigm, uniquely integrating efficiency, interpretability, and zero-training capability.

Technology Category

Application Category

📝 Abstract

Personalized alignments for individual users have been a long-standing goal in large language models (LLMs). We introduce Drift, a novel framework that personalizes LLMs at decoding time with implicit user preferences. Traditional Reinforcement Learning from Human Feedback (RLHF) requires thousands of annotated examples and expensive gradient updates. In contrast, Drift personalizes LLMs in a training-free manner, using only a few dozen examples to steer a frozen model through efficient preference modeling. Our approach models user preferences as a composition of predefined, interpretable attributes and aligns them at decoding time to enable personalized generation. Experiments on both a synthetic persona dataset (Perspective) and a real human-annotated dataset (PRISM) demonstrate that Drift significantly outperforms RLHF baselines while using only 50-100 examples. Our results and analysis show that Drift is both computationally efficient and interpretable.

Problem

Research questions and friction points this paper is trying to address.

Personalizing LLMs with implicit user preferences

Training-free personalization using few examples

Enhancing interpretability and efficiency in alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoding-time personalized alignments

Training-free preference modeling

Interpretable user attributes composition

🔎 Similar Papers

PAD: Personalized Alignment of LLMs at Decoding-Time

2024-10-05Citations: 4

OpenAI

$380K – $445K • Offers Equity

San Francisco, CA, USA

Research Engineer, Language - Personalization, Meta Superintelligence Labs