COMPASS: Cross-embodiment Mobility Policy via Residual RL and Skill Synthesis

📅 2025-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of poor policy generalization in cross-robot embodiment transfer—caused by coupled covariate shift, sparse sampling, and stringent physical constraints—this paper proposes a three-stage universal locomotion policy learning framework. First, a platform-specific base policy is acquired via imitation learning (IL). Second, residual reinforcement learning (Residual RL) is integrated with a world model to jointly optimize environmental adaptability and sample efficiency. Third, cross-embodiment representation alignment and policy distillation enable robust, compact policy transfer across morphologically diverse robots. This work is the first to end-to-end integrate IL, Residual RL, and distillation for deployable universal policies on heterogeneous robotic platforms. Experiments demonstrate that our method achieves approximately fivefold higher task success rates than baseline IL across multiple real-world robots, significantly improving cross-morphology transfer performance and physical feasibility.

Technology Category

Application Category

📝 Abstract
As robots are increasingly deployed in diverse application domains, generalizable cross-embodiment mobility policies are increasingly essential. While classical mobility stacks have proven effective on specific robot platforms, they pose significant challenges when scaling to new embodiments. Learning-based methods, such as imitation learning (IL) and reinforcement learning (RL), offer alternative solutions but suffer from covariate shift, sparse sampling in large environments, and embodiment-specific constraints. This paper introduces COMPASS, a novel workflow for developing cross-embodiment mobility policies by integrating IL, residual RL, and policy distillation. We begin with IL on a mobile robot, leveraging easily accessible teacher policies to train a foundational model that combines a world model with a mobility policy. Building on this base, we employ residual RL to fine-tune embodiment-specific policies, exploiting pre-trained representations to improve sampling efficiency in handling various physical constraints and sensor modalities. Finally, policy distillation merges these embodiment-specialist policies into a single robust cross-embodiment policy. We empirically demonstrate that COMPASS scales effectively across diverse robot platforms while maintaining adaptability to various environment configurations, achieving a generalist policy with a success rate approximately 5X higher than the pre-trained IL policy. The resulting framework offers an efficient, scalable solution for cross-embodiment mobility, enabling robots with different designs to navigate safely and efficiently in complex scenarios.
Problem

Research questions and friction points this paper is trying to address.

Develop cross-embodiment mobility policies
Overcome covariate shift and sparse sampling
Integrate IL, residual RL, and policy distillation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates imitation and reinforcement learning
Employs residual RL for fine-tuning
Uses policy distillation for robustness
🔎 Similar Papers
No similar papers found.