🤖 AI Summary
This work addresses the challenge of cross-domain policy transfer when expert demonstrations in the target domain are scarce and costly to obtain. It proposes the first semi-supervised cross-domain imitation learning framework that leverages only a small number of labeled expert trajectories from the target domain alongside abundant unlabeled, suboptimal trajectories from the source domain. By integrating an adaptive weighted loss function, a cross-domain state-action mapping module, and a distribution alignment mechanism, the method effectively combines knowledge from both domains with theoretical guarantees. Experimental results on MuJoCo and RoboSuite benchmarks demonstrate that the proposed algorithm significantly outperforms existing approaches, achieving stable, efficient, and data-efficient policy transfer with minimal supervision in the target domain.
📝 Abstract
Cross-domain imitation learning (CDIL) accelerates policy learning by transferring expert knowledge across domains, which is valuable in applications where the collection of expert data is costly. Existing methods are either supervised, relying on proxy tasks and explicit alignment, or unsupervised, aligning distributions without paired data, but often unstable. We introduce the Semi-Supervised CDIL (SS-CDIL) setting and propose the first algorithm for SS-CDIL with theoretical justification. Our method uses only offline data, including a small number of target expert demonstrations and some unlabeled imperfect trajectories. To handle domain discrepancy, we propose a novel cross-domain loss function for learning inter-domain state-action mappings and design an adaptive weight function to balance the source and target knowledge. Experiments on MuJoCo and Robosuite show consistent gains over the baselines, demonstrating that our approach achieves stable and data-efficient policy learning with minimal supervision. Our code is available at~ https://github.com/NYCU-RL-Bandits-Lab/CDIL.