MOTIF: Learning Action Motifs for Few-shot Cross-Embodiment Transfer

📅 2026-02-14

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the challenge of generalization in cross-embodiment policy transfer, which arises from kinematic discrepancies and the scarcity of real-world demonstration data. To overcome this, the authors propose a decoupled learning framework based on action motifs—structured spatiotemporal behavior patterns shared across embodiments. By integrating vector quantization, progress-aware alignment, and embodiment-adversarial constraints, the method explicitly models these shared motifs while disentangling embodiment-specific factors. A lightweight predictor combined with a flow-matching strategy enables efficient few-shot transfer. The approach transcends the representational limitations of conventional shared-private architectures, achieving significant performance gains: it outperforms strong baselines by 6.5% in simulation and by 43.7% in real-world environments, substantially advancing cross-embodiment policy transfer.

Technology Category

Application Category

📝 Abstract

While vision-language-action (VLA) models have advanced generalist robotic learning, cross-embodiment transfer remains challenging due to kinematic heterogeneity and the high cost of collecting sufficient real-world demonstrations to support fine-tuning. Existing cross-embodiment policies typically rely on shared-private architectures, which suffer from limited capacity of private parameters and lack explicit adaptation mechanisms. To address these limitations, we introduce MOTIF for efficient few-shot cross-embodiment transfer that decouples embodiment-agnostic spatiotemporal patterns, termed action motifs, from heterogeneous action data. Specifically, MOTIF first learns unified motifs via vector quantization with progress-aware alignment and embodiment adversarial constraints to ensure temporal and cross-embodiment consistency. We then design a lightweight predictor that predicts these motifs from real-time inputs to guide a flow-matching policy, fusing them with robot-specific states to enable action generation on new embodiments. Evaluations across both simulation and real-world environments validate the superiority of MOTIF, which significantly outperforms strong baselines in few-shot transfer scenarios by 6.5% in simulation and 43.7% in real-world settings. Code is available at https://github.com/buduz/MOTIF.

Problem

Research questions and friction points this paper is trying to address.

cross-embodiment transfer

few-shot learning

action transfer

kinematic heterogeneity

robotic learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

action motifs

few-shot cross-embodiment transfer

vector quantization