🤖 AI Summary
This work addresses the limitations of existing personalization methods for large language models, which rely on point estimates to infer user preferences under sparse data, often yielding inaccurate and unreliable weights. To overcome this, the authors propose the Variational Reward Factorization (VRF) framework, which introduces uncertainty-aware modeling into reward decomposition for the first time. VRF employs a variational encoder to represent user preferences as probability distributions in a shared latent space and leverages Wasserstein distance to align these distributions with probabilistic preference basis functions, thereby yielding robust weighting. Additionally, a variance-decay loss is designed to suppress the influence of high-uncertainty estimates. Experiments demonstrate that VRF significantly outperforms current approaches across three benchmarks, exhibiting strong performance for both seen and unseen users, in few-shot settings, and under varying uncertainty conditions, while also enhancing downstream alignment effectiveness.
📝 Abstract
Reward factorization personalizes large language models (LLMs) by decomposing rewards into shared basis functions and user-specific weights. Yet, existing methods estimate user weights from scarce data in isolation and as deterministic points, leading to inaccurate and unreliable inference. We introduce Variational Reward Factorization (VRF), an uncertainty-aware framework that represents each user's preferences as a variational distribution in a shared preference space. VRF infers user distributions via a variational encoder, derives weights through Wasserstein distance matching with shared probabilistic bases, and downweights uncertain estimates through a variance-attenuated loss. On three benchmarks, VRF outperforms all baselines across seen and unseen users, few-shot scenarios, and varying uncertainty levels, with gains extending to downstream alignment.