PrefMMT: Modeling Human Preferences in Preference-based Reinforcement Learning with Multimodal Transformers

📅 2024-09-20
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses inaccurate human preference modeling in preference-based reinforcement learning (PbRL), challenging the restrictive Markov assumption. We propose the first multimodal Transformer architecture specifically designed for preference modeling. Our method decouples state and action sequences as distinct modalities and employs a hierarchical design to jointly capture intra-modal temporal dependencies and inter-modal state–action interactions, enabling non-Markovian, multimodal joint preference representation. Key innovations include state–action decoupled encoding, hierarchical temporal modeling, and a preference sequence learning mechanism. Evaluated on D4RL locomotion and Meta-World manipulation benchmarks, our approach significantly outperforms existing preference modeling methods, achieving improved accuracy and robustness in aligning learned policies with human preferences.

Technology Category

Application Category

📝 Abstract
Preference-based reinforcement learning (PbRL) shows promise in aligning robot behaviors with human preferences, but its success depends heavily on the accurate modeling of human preferences through reward models. Most methods adopt Markovian assumptions for preference modeling (PM), which overlook the temporal dependencies within robot behavior trajectories that impact human evaluations. While recent works have utilized sequence modeling to mitigate this by learning sequential non-Markovian rewards, they ignore the multimodal nature of robot trajectories, which consist of elements from two distinctive modalities: state and action. As a result, they often struggle to capture the complex interplay between these modalities that significantly shapes human preferences. In this paper, we propose a multimodal sequence modeling approach for PM by disentangling state and action modalities. We introduce a multimodal transformer network, named PrefMMT, which hierarchically leverages intra-modal temporal dependencies and inter-modal state-action interactions to capture complex preference patterns. We demonstrate that PrefMMT consistently outperforms state-of-the-art PM baselines on locomotion tasks from the D4RL benchmark and manipulation tasks from the Meta-World benchmark.
Problem

Research questions and friction points this paper is trying to address.

Modeling human preferences in robot behavior trajectories.
Capturing temporal dependencies and multimodal interactions in trajectories.
Improving preference-based reinforcement learning with multimodal transformers.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal transformer for preference modeling
Disentangles state and action modalities
Hierarchical intra-modal and inter-modal interactions
D
Dezhong Zhao
College of Mechanical and Electrical Engineering, Beijing University of Chemical Technology, Beijing, China
R
Ruiqi Wang
SMART Laboratory, Department of Computer and Information Technology, Purdue University, West Lafayette, IN, USA
Dayoon Suh
Dayoon Suh
Purdue University
Computer VisionRoboticsMachine Learning
T
Taehyeon Kim
SMART Laboratory, Department of Computer and Information Technology, Purdue University, West Lafayette, IN, USA
Ziqin Yuan
Ziqin Yuan
Purdue University
Robotics
Byung-Cheol Min
Byung-Cheol Min
Professor of Computer Science and Intelligent Systems Engineering, Indiana University Bloomington
RoboticsHuman-Robot InteractionRobot LearningMulti-Robot SystemsArtificial Intelligence
Guohua Chen
Guohua Chen
College of Mechanical and Electrical Engineering, Beijing University of Chemical Technology, Beijing, China