🤖 AI Summary
To address the challenges of poor data reusability and limited model generalizability across heterogeneous robotic platforms in embodied intelligence, this paper proposes a unified framework for generalizable action learning. Our method introduces, for the first time, a learnable universal action token space, achieved through structure-aware self-supervised action representation learning and unified action tokenization—enabling atomic behavior alignment across morphologically diverse robots and zero-shot action transfer. Additionally, we incorporate a lightweight robot adaptation head to facilitate rapid platform-specific fine-tuning. Evaluated on multiple real-world and simulated robotic platforms, our 0.5B-parameter model outperforms a state-of-the-art (SOTA) model 14× larger in scale, achieving significant improvements in cross-platform control accuracy and deployment efficiency for novel robots.
📝 Abstract
Training on diverse, internet-scale data is a key factor in the success of recent large foundation models. Yet, using the same recipe for building embodied agents has faced noticeable difficulties. Despite the availability of many crowd-sourced embodied datasets, their action spaces often exhibit significant heterogeneity due to distinct physical embodiment and control interfaces for different robots, causing substantial challenges in developing embodied foundation models using cross-domain data. In this paper, we introduce UniAct, a new embodied foundation modeling framework operating in a tokenized Universal Action Space. Our learned universal actions capture the generic atomic behaviors across diverse robots by exploiting their shared structural features, and enable enhanced cross-domain data utilization and cross-embodiment generalizations by eliminating the notorious heterogeneity. The universal actions can be efficiently translated back to heterogeneous actionable commands by simply adding embodiment-specific details, from which fast adaptation to new robots becomes simple and straightforward. Our 0.5B instantiation of UniAct outperforms 14X larger SOTA embodied foundation models in extensive evaluations on various real-world and simulation robots, showcasing exceptional cross-embodiment control and adaptation capability, highlighting the crucial benefit of adopting universal actions. Project page: https://github.com/2toinf/UniAct