🤖 AI Summary
To address the high computational cost and poor long-context extrapolation capability of Transformers in contextual imitation learning, this paper introduces state space models (SSMs) to few-shot robotic task learning for the first time, proposing an efficient and scalable framework based on the Longhorn architecture. The method enables linear-time-complexity inference over long sequences and integrates a context prompting mechanism to model action sequences while supporting cross-task generalization. Evaluated on the LIBERO benchmark, it significantly outperforms Transformer-based baselines—particularly under low-shot demonstration settings, unseen tasks, and long-horizon scenarios—demonstrating superior robustness and generalization. Key contributions include: (1) establishing the first SSM-based paradigm for contextual imitation learning; (2) overcoming the long-context bottleneck inherent to Transformers; and (3) providing a viable pathway for resource-constrained robotic learning.
📝 Abstract
In-context imitation learning (ICIL) enables robots to learn tasks from prompts consisting of just a handful of demonstrations. By eliminating the need for parameter updates at deployment time, this paradigm supports few-shot adaptation to novel tasks. However, recent ICIL methods rely on Transformers, which have computational limitations and tend to underperform when handling longer prompts than those seen during training. In this work, we introduce RoboSSM, a scalable recipe for in-context imitation learning based on state-space models (SSM). Specifically, RoboSSM replaces Transformers with Longhorn -- a state-of-the-art SSM that provides linear-time inference and strong extrapolation capabilities, making it well-suited for long-context prompts. We evaluate our approach on the LIBERO benchmark and compare it against strong Transformer-based ICIL baselines. Experiments show that RoboSSM extrapolates effectively to varying numbers of in-context demonstrations, yields high performance on unseen tasks, and remains robust in long-horizon scenarios. These results highlight the potential of SSMs as an efficient and scalable backbone for ICIL. Our code is available at https://github.com/youngjuY/RoboSSM.