MoRI: Mixture of RL and IL Experts for Long-Horizon Manipulation Tasks

📅 2026-04-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

213K/year
🤖 AI Summary
This work addresses the limitations of reinforcement learning (RL) in sample efficiency and the error accumulation and distributional shift inherent in imitation learning (IL), particularly in complex, long-horizon manipulation tasks. To overcome these challenges, the authors propose MoRI, a novel framework that introduces an action-variance-driven dynamic expert-switching mechanism. Operating within an offline pretraining and online fine-tuning paradigm, MoRI adaptively integrates IL and RL policies while employing IL-based regularization to ensure safe exploration. The system effectively balances coarse locomotion and fine manipulation, achieving an average success rate of 97.5% across four real-world complex tasks. Compared to baseline RL methods, MoRI reduces human intervention by 85.8% and shortens convergence time by 21%.

Technology Category

Application Category

📝 Abstract
Reinforcement Learning (RL) and Imitation Learning (IL) are the standard frameworks for policy acquisition in manipulation. While IL offers efficient policy derivation, it suffers from compounding errors and distribution shift. Conversely, RL facilitates autonomous exploration but is frequently hindered by low sample efficiency and the high cost of trial and error. Since existing hybrid methods often struggle with complex tasks, we introduce Mixture of RL and IL Experts (MoRI). This system dynamically switches between IL and RL experts based on the variance of expert actions to handle coarse movements and fine-grained manipulations. MoRI employs an offline pre-training stage followed by online fine-tuning to accelerate convergence. To maintain exploration safety and minimize human intervention, the system applies IL-based regularization to the RL component. Evaluation across four complex real-world tasks shows that MoRI achieves an average success rate of 97.5% within 2 to 5 hours of fine-tuning. Compared to baseline RL algorithms, MoRI reduces human intervention by 85.8% and shortens convergence time by 21%, demonstrating its capability in robotic manipulation.
Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning
Imitation Learning
Long-Horizon Manipulation
Distribution Shift
Sample Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Experts
Reinforcement Learning
Imitation Learning
Long-Horizon Manipulation
Offline-to-Online Learning
Y
Yaohang Xu
School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
L
Lianjie Ma
School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
G
Gewei Zuo
School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
Wentao Zhang
Wentao Zhang
Institute of Physics, Chinese Academy of Sciences
photoemissionsuperconductivitycupratehtsctime-resolved
H
Han Ding
School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, China
Lijun Zhu
Lijun Zhu
Purdue University