PrefCLM: Enhancing Preference-based Reinforcement Learning with Crowdsourced Large Language Models

๐Ÿ“… 2024-07-11
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 2
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This study addresses key limitations in preference-based reinforcement learning: the difficulty of manually designing teacher scripts to capture personalized human-robot interaction preferences, heavy reliance on extensive human feedback, and neglect of inter-user heterogeneity in expectations. We propose an adaptive preference learning framework leveraging crowd-sourced large language models (LLMs). Our contributions are twofold: (1) a novel multi-LLM preference score fusion mechanism grounded in Dempsterโ€“Shafer evidence theory, enabling robust and interpretable preference aggregation; and (2) an iterative human-in-the-loop collective refinement pipeline that supports dynamic alignment with evolving user preferences and behavior-aware optimization. Empirically, our method matches expert-scripted teachers on standard RL benchmarks. A real-user study (N=10) demonstrates statistically significant improvements in perceived robot behavioral naturalness (p<0.01) and user satisfaction (+32.7%).

Technology Category

Application Category

๐Ÿ“ Abstract
Preference-based reinforcement learning (PbRL) is emerging as a promising approach to teaching robots through human comparative feedback, sidestepping the need for complex reward engineering. However, the substantial volume of feedback required in existing PbRL methods often lead to reliance on synthetic feedback generated by scripted teachers. This approach necessitates intricate reward engineering again and struggles to adapt to the nuanced preferences particular to human-robot interaction (HRI) scenarios, where users may have unique expectations toward the same task. To address these challenges, we introduce PrefCLM, a novel framework that utilizes crowdsourced large language models (LLMs) as simulated teachers in PbRL. We utilize Dempster-Shafer Theory to fuse individual preferences from multiple LLM agents at the score level, efficiently leveraging their diversity and collective intelligence. We also introduce a human-in-the-loop pipeline that facilitates collective refinements based on user interactive feedback. Experimental results across various general RL tasks show that PrefCLM achieves competitive performance compared to traditional scripted teachers and excels in facilitating more more natural and efficient behaviors. A real-world user study (N=10) further demonstrates its capability to tailor robot behaviors to individual user preferences, significantly enhancing user satisfaction in HRI scenarios.
Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning
Human Feedback
Adaptive Robotics
Innovation

Methods, ideas, or system contributions that make the work stand out.

PrefCLM
Virtual Teachers
Personalized Human-Robot Interaction
๐Ÿ”Ž Similar Papers
No similar papers found.
R
Ruiqi Wang
SMART Laboratory, Department of Computer and Information Technology, Purdue University, West Lafayette, IN, USA
D
Dezhong Zhao
College of Mechanical and Electrical Engineering, Beijing University of Chemical Technology, Beijing, China
Ziqin Yuan
Ziqin Yuan
Purdue University
Robotics
Ike Obi
Ike Obi
Purdue University
Byung-Cheol Min
Byung-Cheol Min
Professor of Computer Science and Intelligent Systems Engineering, Indiana University Bloomington
RoboticsHuman-Robot InteractionRobot LearningMulti-Robot SystemsArtificial Intelligence