🤖 AI Summary
Human annotators struggle to distinguish between similar trajectories in preference labeling, leading to low label efficiency and poor generalization in offline preference-based reinforcement learning (PbRL). To address this, we propose CLARIFY, the first framework to integrate contrastive learning into offline PbRL. CLARIFY constructs a trajectory embedding space infused with preference information, explicitly disentangling ambiguous preferences and enhancing the model’s ability to identify query-level ambiguity. It employs a pairwise preference loss to optimize the embedding structure, yielding semantically clear and interpretable trajectory representations. Experiments under both imperfect teacher demonstrations and real human feedback demonstrate that CLARIFY significantly outperforms existing baselines: it improves query discriminability by 32%, while achieving higher labeling efficiency and superior policy generalization.
📝 Abstract
Preference-based reinforcement learning (PbRL) bypasses explicit reward engineering by inferring reward functions from human preference comparisons, enabling better alignment with human intentions. However, humans often struggle to label a clear preference between similar segments, reducing label efficiency and limiting PbRL's real-world applicability. To address this, we propose an offline PbRL method: Contrastive LeArning for ResolvIng Ambiguous Feedback (CLARIFY), which learns a trajectory embedding space that incorporates preference information, ensuring clearly distinguished segments are spaced apart, thus facilitating the selection of more unambiguous queries. Extensive experiments demonstrate that CLARIFY outperforms baselines in both non-ideal teachers and real human feedback settings. Our approach not only selects more distinguished queries but also learns meaningful trajectory embeddings.