🤖 AI Summary
Existing preference-based reinforcement learning (PbRL) methods for personalization in human-robot interaction (HRI) require training policies from scratch, resulting in inefficient utilization of human feedback. To address this, we propose Preference-driven Action Representation Learning (PbARL), a framework that efficiently fine-tunes pre-trained policies without degrading original task performance. PbARL decouples task-invariant features from user-specific preferences and enables domain adaptation from source to preference-aligned target domains via mutual information maximization—without requiring prior knowledge of the source domain. This work is the first to integrate action representation learning with PbRL, unifying policy personalization and task preservation. Evaluated on the Assistive Gym benchmark and an 8-participant user study, PbARL achieves significantly higher personalization accuracy and user satisfaction compared to state-of-the-art methods, while retaining over 98.2% of the original task performance.
📝 Abstract
Preference-based reinforcement learning (PbRL) has shown significant promise for personalization in human-robot interaction (HRI) by explicitly integrating human preferences into the robot learning process. However, existing practices often require training a personalized robot policy from scratch, resulting in inefficient use of human feedback. In this paper, we propose preference-based action representation learning (PbARL), an efficient fine-tuning method that decouples common task structure from preference by leveraging pre-trained robot policies. Instead of directly fine-tuning the pre-trained policy with human preference, PbARL uses it as a reference for an action representation learning task that maximizes the mutual information between the pre-trained source domain and the target user preference-aligned domain. This approach allows the robot to personalize its behaviors while preserving original task performance and eliminates the need for extensive prior information from the source domain, thereby enhancing efficiency and practicality in real-world HRI scenarios. Empirical results on the Assistive Gym benchmark and a real-world user study (N=8) demonstrate the benefits of our method compared to state-of-the-art approaches.