MoRI: Mixture of RL and IL Experts for Long-Horizon Manipulation Tasks

📅 2026-04-11

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work addresses the limitations of reinforcement learning (RL) in sample efficiency and the error accumulation and distributional shift inherent in imitation learning (IL), particularly in complex, long-horizon manipulation tasks. To overcome these challenges, the authors propose MoRI, a novel framework that introduces an action-variance-driven dynamic expert-switching mechanism. Operating within an offline pretraining and online fine-tuning paradigm, MoRI adaptively integrates IL and RL policies while employing IL-based regularization to ensure safe exploration. The system effectively balances coarse locomotion and fine manipulation, achieving an average success rate of 97.5% across four real-world complex tasks. Compared to baseline RL methods, MoRI reduces human intervention by 85.8% and shortens convergence time by 21%.

Technology Category

Application Category

📝 Abstract

Reinforcement Learning (RL) and Imitation Learning (IL) are the standard frameworks for policy acquisition in manipulation. While IL offers efficient policy derivation, it suffers from compounding errors and distribution shift. Conversely, RL facilitates autonomous exploration but is frequently hindered by low sample efficiency and the high cost of trial and error. Since existing hybrid methods often struggle with complex tasks, we introduce Mixture of RL and IL Experts (MoRI). This system dynamically switches between IL and RL experts based on the variance of expert actions to handle coarse movements and fine-grained manipulations. MoRI employs an offline pre-training stage followed by online fine-tuning to accelerate convergence. To maintain exploration safety and minimize human intervention, the system applies IL-based regularization to the RL component. Evaluation across four complex real-world tasks shows that MoRI achieves an average success rate of 97.5% within 2 to 5 hours of fine-tuning. Compared to baseline RL algorithms, MoRI reduces human intervention by 85.8% and shortens convergence time by 21%, demonstrating its capability in robotic manipulation.

Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning

Imitation Learning

Long-Horizon Manipulation

Distribution Shift

Sample Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Experts

Reinforcement Learning

Imitation Learning