Personalizing LLMs with Binary Feedback: A Preference-Corrected Optimization Framework

๐Ÿ“… 2026-05-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

195K/year
๐Ÿค– AI Summary
This work addresses the limitation of existing personalization methods for large language models (LLMs), which often overlook inter-user differences and struggle to effectively leverage binary feedback for modeling individual preferences. To overcome this, the authors propose the C-BPO framework, which innovatively treats a target userโ€™s data as positive samples and other usersโ€™ data as implicit negative samples. A preference calibration mechanism is introduced to explicitly model user-specific discrepancies. Furthermore, integrating Positive-Unlabeled (PU) learning theory, the framework corrects the positive bias inherent in the constructed negative samples. This approach enhances personalization while preserving the modelโ€™s general capabilities. Extensive experiments across diverse tasks and backbone LLMs demonstrate that C-BPO consistently outperforms current baselines, validating its effectiveness in disentangling user-specific preferences from shared knowledge.
๐Ÿ“ Abstract
Large Language Model (LLM) personalization aims to align model behaviors with individual user preferences. Existing methods often focus on isolated user histories, neglecting the essential role of inter-user differences. We propose C-BPO, a framework that personalizes LLMs via preference-calibrated binary signals. By treating target user data as positive feedback and other users' data as an auxiliary set of implicit negative signals, C-BPO captures distinct inter-user differences. To mitigate the preference overlap issue, where shared task knowledge is erroneously penalized, we derive an objective grounded in Positive-Unlabeled (PU) learning theory. This approach purifies negative signals by subtracting ``positive bias'', ensuring alignment with unique idiosyncrasies without compromising general helpfulness. Empirical experiments across various personalization tasks and backbone LLMs show C-BPO consistently outperforms baselines, demonstrating the efficacy of preference-calibrated binary signals in modeling inter-user differences.
Problem

Research questions and friction points this paper is trying to address.

LLM personalization
inter-user differences
binary feedback
preference alignment
user preferences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Preference Calibration
Binary Feedback
Inter-user Differences
Positive-Unlabeled Learning
LLM Personalization
๐Ÿ”Ž Similar Papers
No similar papers found.