MoRE: Unlocking Scalability in Reinforcement Learning for Quadruped Vision-Language-Action Models

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of multi-task generalization for quadrupedal robots, this work proposes a unified Vision–Language–Action (VLA) framework. Methodologically: (1) we design a sparsely activated “Mixture of Robot Experts” (MoRE) architecture, integrating low-rank adaptation (LoRA) modules into a multimodal large model; (2) we introduce the first end-to-end reinforcement learning paradigm formalized via Q-functions, enabling deep coupling between the VLA model and task structure; and (3) we perform efficient knowledge distillation and fine-tuning using automatically collected, mixed-quality embodied data. Experiments demonstrate that our approach consistently outperforms baselines across six embodied locomotion and manipulation skills, achieves significant improvements in out-of-distribution generalization, and successfully deploys on real-world quadrupedal robots—validating its robustness, practicality, and scalability.

Technology Category

Application Category

📝 Abstract
Developing versatile quadruped robots that can smoothly perform various actions and tasks in real-world environments remains a significant challenge. This paper introduces a novel vision-language-action (VLA) model, mixture of robotic experts (MoRE), for quadruped robots that aim to introduce reinforcement learning (RL) for fine-tuning large-scale VLA models with a large amount of mixed-quality data. MoRE integrates multiple low-rank adaptation modules as distinct experts within a dense multi-modal large language model (MLLM), forming a sparse-activated mixture-of-experts model. This design enables the model to effectively adapt to a wide array of downstream tasks. Moreover, we employ a reinforcement learning-based training objective to train our model as a Q-function after deeply exploring the structural properties of our tasks. Effective learning from automatically collected mixed-quality data enhances data efficiency and model performance. Extensive experiments demonstrate that MoRE outperforms all baselines across six different skills and exhibits superior generalization capabilities in out-of-distribution scenarios. We further validate our method in real-world scenarios, confirming the practicality of our approach and laying a solid foundation for future research on multi-task learning in quadruped robots.
Problem

Research questions and friction points this paper is trying to address.

Develop versatile quadruped robots for real-world tasks
Fine-tune large-scale VLA models using reinforcement learning
Enhance data efficiency and model performance with mixed-quality data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of robotic experts (MoRE) for quadruped robots
Reinforcement learning fine-tunes large-scale VLA models
Sparse-activated mixture-of-experts model enhances task adaptation
🔎 Similar Papers
No similar papers found.
H
Han Zhao
Zhejiang University, China; MiLAB, Westlake University, China
Wenxuan Song
Wenxuan Song
The Hong Kong University of Science and Technology (Guangzhou)
Vision-language-action ModelRobotics
D
Donglin Wang
MiLAB, Westlake University, China
X
Xinyang Tong
MiLAB, Westlake University, China
Pengxiang Ding
Pengxiang Ding
Zhejiang University
Human Motion PredictionLarge Language ModelEmbodied AI
Xuelian Cheng
Xuelian Cheng
Monash University
3D VisionMedical ImagingMachine Learning
Z
Zongyuan Ge
AIM Lab, Faculty of IT, Monash University, Australia