QMP: Q-switch Mixture of Policies for Multi-Task Behavior Sharing

📅 2023-02-01
📈 Citations: 4
Influential: 2
📄 PDF
🤖 AI Summary
Multi-task reinforcement learning (MTRL) suffers from low sample efficiency, and existing behavior policy sharing approaches lack theoretical grounding and analytical tractability. To address this, we propose a Q-value-driven cross-task policy sharing paradigm: dynamically selecting and mixing high-quality heterogeneous policies across tasks based on their task-specific Q-functions, thereby improving the quality of off-policy data collection. Our key innovation is the first principled integration of Q-value evaluation into both policy mixing and switching mechanisms, enabling theoretically guaranteed behavior sharing. We provide rigorous theoretical analysis proving that our method strictly improves the sample efficiency of underlying RL algorithms. Empirically, we evaluate our approach across diverse domains—including manipulation, locomotion, and navigation—and demonstrate consistent, significant improvements over state-of-the-art MTRL methods and various behavior-sharing baselines, validating its generalizability and complementary benefits.
📝 Abstract
Multi-task reinforcement learning (MTRL) aims to learn several tasks simultaneously for better sample efficiency than learning them separately. Traditional methods achieve this by sharing parameters or relabeled data between tasks. In this work, we introduce a new framework for sharing behavioral policies across tasks, which can be used in addition to existing MTRL methods. The key idea is to improve each task's off-policy data collection by employing behaviors from other task policies. Selectively sharing helpful behaviors acquired in one task to collect training data for another task can lead to higher-quality trajectories, leading to more sample-efficient MTRL. Thus, we introduce a simple and principled framework called Q-switch mixture of policies (QMP) that selectively shares behavior between different task policies by using the task's Q-function to evaluate and select useful shareable behaviors. We theoretically analyze how QMP improves the sample efficiency of the underlying RL algorithm. Our experiments show that QMP's behavioral policy sharing provides complementary gains over many popular MTRL algorithms and outperforms alternative ways to share behaviors in various manipulation, locomotion, and navigation environments. Videos are available at https://qmp-mtrl.github.io.
Problem

Research questions and friction points this paper is trying to address.

Enhances multi-task RL by sharing behaviors across tasks
Uses Q-functions to selectively share beneficial policies
Improves sample efficiency in manipulation, locomotion, navigation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shares behaviors across tasks using Q-functions
Improves off-policy data collection selectively
Combines policies with Q-switch mixture framework
🔎 Similar Papers
No similar papers found.