CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries

📅 2025-05-31

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Human annotators struggle to distinguish between similar trajectories in preference labeling, leading to low label efficiency and poor generalization in offline preference-based reinforcement learning (PbRL). To address this, we propose CLARIFY, the first framework to integrate contrastive learning into offline PbRL. CLARIFY constructs a trajectory embedding space infused with preference information, explicitly disentangling ambiguous preferences and enhancing the model’s ability to identify query-level ambiguity. It employs a pairwise preference loss to optimize the embedding structure, yielding semantically clear and interpretable trajectory representations. Experiments under both imperfect teacher demonstrations and real human feedback demonstrate that CLARIFY significantly outperforms existing baselines: it improves query discriminability by 32%, while achieving higher labeling efficiency and superior policy generalization.

Technology Category

Application Category

📝 Abstract

Preference-based reinforcement learning (PbRL) bypasses explicit reward engineering by inferring reward functions from human preference comparisons, enabling better alignment with human intentions. However, humans often struggle to label a clear preference between similar segments, reducing label efficiency and limiting PbRL's real-world applicability. To address this, we propose an offline PbRL method: Contrastive LeArning for ResolvIng Ambiguous Feedback (CLARIFY), which learns a trajectory embedding space that incorporates preference information, ensuring clearly distinguished segments are spaced apart, thus facilitating the selection of more unambiguous queries. Extensive experiments demonstrate that CLARIFY outperforms baselines in both non-ideal teachers and real human feedback settings. Our approach not only selects more distinguished queries but also learns meaningful trajectory embeddings.

Problem

Research questions and friction points this paper is trying to address.

Addresses ambiguity in human preference labels for reinforcement learning

Improves label efficiency in preference-based reward learning

Enhances trajectory embedding clarity for better query selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive learning for ambiguous feedback resolution

Offline preference-based reinforcement learning method

Trajectory embedding space with preference information

🔎 Similar Papers

Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training