Inference-Time Personalized Alignment with a Few User Preference Queries

📅 2025-11-04

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the high annotation cost and reliance on textual input in personalizing generative models via preference alignment. We propose an efficient alignment method that leverages only a few pairwise response comparisons at inference time. Our core innovation is modeling user feedback as consistent, noise-free ordinal preferences and integrating optimal arm identification theory from the logistic bandits framework to enable rapid personalized selection from a fixed candidate response pool. Crucially, the method eliminates the need for users to provide textual feedback—only 3–5 binary comparisons suffice to significantly improve output alignment with individual preferences. We validate its effectiveness across diverse tasks, including text summarization and image generation. Compared to baseline methods, our approach reduces query complexity by over 70% while maintaining or even improving alignment quality.

Technology Category

Application Category

📝 Abstract

We study the problem of aligning a generative model's response with a user's preferences. Recent works have proposed several different formulations for personalized alignment; however, they either require a large amount of user preference queries or require that the preference be explicitly specified as a text input. In this paper, we propose a novel inference-time personalized alignment method, UserAlign, that elicits the user's preferences with a few queries as pairwise response comparisons. In particular, UserAlign builds on the theoretical framework of best-arm identification in logistic bandits and selects a personalized response from a fixed pool of the model's generated responses. The key idea is to consider the user's feedback consistent and noise-free, and incorporate it into the theoretical framework to identify the best response quickly. Experimental results across several tasks, involving personalized text and image generation, showcase the effectiveness of UserAlign in achieving personalized alignment.

Problem

Research questions and friction points this paper is trying to address.

Aligning generative model responses with user preferences efficiently

Reducing user preference queries through pairwise response comparisons

Achieving personalized alignment without explicit text specification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Inference-time personalized alignment method

Uses few pairwise response comparison queries

Applies best-arm identification from logistic bandits

🔎 Similar Papers

PAD: Personalized Alignment of LLMs at Decoding-Time