Cross-Embodiment Offline Reinforcement Learning for Heterogeneous Robot Datasets

📅 2026-02-20

📈 Citations: 0

✨ Influential: 0

career value

245K/year

🤖 AI Summary

This work proposes a unified policy pretraining approach that integrates offline reinforcement learning with cross-morphology learning to reduce the cost of collecting high-quality demonstration data for diverse robotic platforms. The key innovation lies in a morphology similarity–based static grouping strategy, which mitigates gradient conflicts across morphologies through intra-group gradient updates, outperforming existing conflict-resolution methods. Experiments on a locomotion dataset encompassing 16 distinct robot platforms demonstrate that the proposed method significantly surpasses pure behavior cloning in scenarios dominated by suboptimal trajectories. Moreover, the grouping mechanism substantially enhances both training stability and final policy performance.

Technology Category

Application Category

📝 Abstract

Scalable robot policy pre-training has been hindered by the high cost of collecting high-quality demonstrations for each platform. In this study, we address this issue by uniting offline reinforcement learning (offline RL) with cross-embodiment learning. Offline RL leverages both expert and abundant suboptimal data, and cross-embodiment learning aggregates heterogeneous robot trajectories across diverse morphologies to acquire universal control priors. We perform a systematic analysis of this offline RL and cross-embodiment paradigm, providing a principled understanding of its strengths and limitations. To evaluate this offline RL and cross-embodiment paradigm, we construct a suite of locomotion datasets spanning 16 distinct robot platforms. Our experiments confirm that this combined approach excels at pre-training with datasets rich in suboptimal trajectories, outperforming pure behavior cloning. However, as the proportion of suboptimal data and the number of robot types increase, we observe that conflicting gradients across morphologies begin to impede learning. To mitigate this, we introduce an embodiment-based grouping strategy in which robots are clustered by morphological similarity and the model is updated with a group gradient. This simple, static grouping substantially reduces inter-robot conflicts and outperforms existing conflict-resolution methods.

Problem

Research questions and friction points this paper is trying to address.

cross-embodiment

offline reinforcement learning

heterogeneous robot datasets

scalable policy pre-training

suboptimal trajectories

Innovation

Methods, ideas, or system contributions that make the work stand out.

offline reinforcement learning

cross-embodiment learning

heterogeneous robot datasets

embodiment-based grouping

scalable policy pre-training

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Robotic Control Policy (PhD)