🤖 AI Summary
In long-horizon multi-task settings with scarce target-domain data and immutable models, this paper proposes an in-context policy adaptation framework for rapid skill-based reinforcement learning policy transfer across domains. Our method operates entirely offline—requiring zero parameter updates during inference—and integrates diffusion-model-driven skill learning with cross-domain consistency modeling. Key contributions include: (1) learning domain-agnostic prototype skill representations; (2) introducing a cross-domain skill diffusion mechanism to enhance skill generalization; and (3) designing a dynamic domain prompting strategy to improve context awareness. Evaluated on the Metaworld and CARLA benchmarks, our approach significantly outperforms existing zero-shot and few-shot adaptation methods, achieving an average 23.6% improvement in adaptation performance across diverse cross-domain configurations.
📝 Abstract
In this work, we present an in-context policy adaptation (ICPAD) framework designed for long-horizon multi-task environments, exploring diffusion-based skill learning techniques in cross-domain settings. The framework enables rapid adaptation of skill-based reinforcement learning policies to diverse target domains, especially under stringent constraints on no model updates and only limited target domain data. Specifically, the framework employs a cross-domain skill diffusion scheme, where domain-agnostic prototype skills and a domain-grounded skill adapter are learned jointly and effectively from an offline dataset through cross-domain consistent diffusion processes. The prototype skills act as primitives for common behavior representations of long-horizon policies, serving as a lingua franca to bridge different domains. Furthermore, to enhance the in-context adaptation performance, we develop a dynamic domain prompting scheme that guides the diffusion-based skill adapter toward better alignment with the target domain. Through experiments with robotic manipulation in Metaworld and autonomous driving in CARLA, we show that our $oursol$ framework achieves superior policy adaptation performance under limited target domain data conditions for various cross-domain configurations including differences in environment dynamics, agent embodiment, and task horizon.