🤖 AI Summary
To address the high cost and poor scalability of human feedback in preference-based reinforcement learning, this paper proposes PrefVLM: the first framework to integrate vision-language models (VLMs) into preference learning. It introduces an uncertainty-aware automated preference labeling mechanism, coupled with selective human annotation to minimize human intervention. Furthermore, a self-supervised inverse dynamics loss is designed for task-adaptive fine-tuning of the VLM, enabling cross-task knowledge transfer. Evaluated on Meta-World manipulation tasks, PrefVLM achieves or surpasses state-of-the-art performance while reducing human annotation effort by 50%. The fine-tuned VLM significantly improves feedback efficiency and generalization on unseen tasks. Overall, PrefVLM establishes a novel paradigm for low-feedback, robust preference learning—bridging scalable automation with targeted human oversight.
📝 Abstract
Preference-based reinforcement learning (RL) offers a promising approach for aligning policies with human intent but is often constrained by the high cost of human feedback. In this work, we introduce PrefVLM, a framework that integrates Vision-Language Models (VLMs) with selective human feedback to significantly reduce annotation requirements while maintaining performance. Our method leverages VLMs to generate initial preference labels, which are then filtered to identify uncertain cases for targeted human annotation. Additionally, we adapt VLMs using a self-supervised inverse dynamics loss to improve alignment with evolving policies. Experiments on Meta-World manipulation tasks demonstrate that PrefVLM achieves comparable or superior success rates to state-of-the-art methods while using up to 2 x fewer human annotations. Furthermore, we show that adapted VLMs enable efficient knowledge transfer across tasks, further minimizing feedback needs. Our results highlight the potential of combining VLMs with selective human supervision to make preference-based RL more scalable and practical.