Semi-Supervised Cross-Domain Imitation Learning

📅 2026-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of cross-domain policy transfer when expert demonstrations in the target domain are scarce and costly to obtain. It proposes the first semi-supervised cross-domain imitation learning framework that leverages only a small number of labeled expert trajectories from the target domain alongside abundant unlabeled, suboptimal trajectories from the source domain. By integrating an adaptive weighted loss function, a cross-domain state-action mapping module, and a distribution alignment mechanism, the method effectively combines knowledge from both domains with theoretical guarantees. Experimental results on MuJoCo and RoboSuite benchmarks demonstrate that the proposed algorithm significantly outperforms existing approaches, achieving stable, efficient, and data-efficient policy transfer with minimal supervision in the target domain.

Technology Category

Application Category

📝 Abstract
Cross-domain imitation learning (CDIL) accelerates policy learning by transferring expert knowledge across domains, which is valuable in applications where the collection of expert data is costly. Existing methods are either supervised, relying on proxy tasks and explicit alignment, or unsupervised, aligning distributions without paired data, but often unstable. We introduce the Semi-Supervised CDIL (SS-CDIL) setting and propose the first algorithm for SS-CDIL with theoretical justification. Our method uses only offline data, including a small number of target expert demonstrations and some unlabeled imperfect trajectories. To handle domain discrepancy, we propose a novel cross-domain loss function for learning inter-domain state-action mappings and design an adaptive weight function to balance the source and target knowledge. Experiments on MuJoCo and Robosuite show consistent gains over the baselines, demonstrating that our approach achieves stable and data-efficient policy learning with minimal supervision. Our code is available at~ https://github.com/NYCU-RL-Bandits-Lab/CDIL.
Problem

Research questions and friction points this paper is trying to address.

Cross-domain imitation learning
Semi-supervised learning
Domain discrepancy
Policy learning
Offline data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-Supervised Imitation Learning
Cross-Domain Imitation Learning
Offline Reinforcement Learning
Domain Adaptation
Adaptive Weighting
🔎 Similar Papers
No similar papers found.
L
Li-Min Chu
Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
K
Kai-Siang Ma
Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
M
Ming-Hong Chen
Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Ping-Chun Hsieh
Ping-Chun Hsieh
Associate Professor, National Chiao Tung University
Multi-Armed BanditsReinforcement LearningWireless Networks