COMPASS: Cross-embodiment Mobility Policy via Residual RL and Skill Synthesis

📅 2025-02-22

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Addressing the challenge of poor policy generalization in cross-robot embodiment transfer—caused by coupled covariate shift, sparse sampling, and stringent physical constraints—this paper proposes a three-stage universal locomotion policy learning framework. First, a platform-specific base policy is acquired via imitation learning (IL). Second, residual reinforcement learning (Residual RL) is integrated with a world model to jointly optimize environmental adaptability and sample efficiency. Third, cross-embodiment representation alignment and policy distillation enable robust, compact policy transfer across morphologically diverse robots. This work is the first to end-to-end integrate IL, Residual RL, and distillation for deployable universal policies on heterogeneous robotic platforms. Experiments demonstrate that our method achieves approximately fivefold higher task success rates than baseline IL across multiple real-world robots, significantly improving cross-morphology transfer performance and physical feasibility.

Technology Category

Application Category

📝 Abstract

As robots are increasingly deployed in diverse application domains, generalizable cross-embodiment mobility policies are increasingly essential. While classical mobility stacks have proven effective on specific robot platforms, they pose significant challenges when scaling to new embodiments. Learning-based methods, such as imitation learning (IL) and reinforcement learning (RL), offer alternative solutions but suffer from covariate shift, sparse sampling in large environments, and embodiment-specific constraints. This paper introduces COMPASS, a novel workflow for developing cross-embodiment mobility policies by integrating IL, residual RL, and policy distillation. We begin with IL on a mobile robot, leveraging easily accessible teacher policies to train a foundational model that combines a world model with a mobility policy. Building on this base, we employ residual RL to fine-tune embodiment-specific policies, exploiting pre-trained representations to improve sampling efficiency in handling various physical constraints and sensor modalities. Finally, policy distillation merges these embodiment-specialist policies into a single robust cross-embodiment policy. We empirically demonstrate that COMPASS scales effectively across diverse robot platforms while maintaining adaptability to various environment configurations, achieving a generalist policy with a success rate approximately 5X higher than the pre-trained IL policy. The resulting framework offers an efficient, scalable solution for cross-embodiment mobility, enabling robots with different designs to navigate safely and efficiently in complex scenarios.

Problem

Research questions and friction points this paper is trying to address.

Develop cross-embodiment mobility policies

Overcome covariate shift and sparse sampling

Integrate IL, residual RL, and policy distillation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates imitation and reinforcement learning

Employs residual RL for fine-tuning

Uses policy distillation for robustness

🔎 Similar Papers

MLLM as Retriever: Interactively Learning Multimodal Retrieval for Embodied Agents