Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Offline multi-task reinforcement learning (MTRL) suffers from severe performance degradation as task scale increases, exposing a critical scalability bottleneck. Method: This paper proposes MoE-enhanced Decision Transformer (MoE-DT) coupled with a three-stage progressive training paradigm. It introduces the Mixture-of-Experts (MoE) architecture into Decision Transformers for the first time, enabling task-aware sparse parameter activation. The training pipeline—comprising pretraining, task alignment, and policy fine-tuning—decouples parameter expansion from task expansion. Contribution/Results: Evaluated on a 160-task benchmark, MoE-DT achieves consistent performance gains with increasing model capacity, substantially outperforming existing MTRL approaches. The results demonstrate superior scalability and cross-task generalization, validating both architectural innovation and training strategy effectiveness.

Technology Category

Application Category

📝 Abstract

Despite recent advancements in offline multi-task reinforcement learning (MTRL) have harnessed the powerful capabilities of the Transformer architecture, most approaches focus on a limited number of tasks, with scaling to extremely massive tasks remaining a formidable challenge. In this paper, we first revisit the key impact of task numbers on current MTRL method, and further reveal that naively expanding the parameters proves insufficient to counteract the performance degradation as the number of tasks escalates. Building upon these insights, we propose M3DT, a novel mixture-of-experts (MoE) framework that tackles task scalability by further unlocking the model's parameter scalability. Specifically, we enhance both the architecture and the optimization of the agent, where we strengthen the Decision Transformer (DT) backbone with MoE to reduce task load on parameter subsets, and introduce a three-stage training mechanism to facilitate efficient training with optimal performance. Experimental results show that, by increasing the number of experts, M3DT not only consistently enhances its performance as model expansion on the fixed task numbers, but also exhibits remarkable task scalability, successfully extending to 160 tasks with superior performance.

Problem

Research questions and friction points this paper is trying to address.

Scaling offline multi-task reinforcement learning to massive tasks

Addressing performance degradation with increasing task numbers

Enhancing model scalability via mixture-of-experts framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts enhances Decision Transformer

Three-stage training ensures optimal performance

Scalable to 160 tasks with superior results

🔎 Similar Papers

Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL