🤖 AI Summary
This work addresses the challenge of cross-domain reinforcement learning, where the absence of environment interaction and reward signals in the target domain hinders direct policy learning. The authors propose the first application of diffusion Schrödinger bridges to this setting, aligning the distributions of source-domain dynamics and target-domain offline demonstrations through a principled probabilistic framework. By integrating a state-transition-based reward modulation mechanism, the method enables effective policy transfer without requiring any interaction with the target environment. This approach simultaneously preserves dynamic consistency and ensures reward compatibility. Evaluated on standard MuJoCo cross-domain benchmarks, the proposed method significantly outperforms existing approaches, demonstrating robust adaptability to shifts in environmental dynamics.
📝 Abstract
Cross-domain reinforcement learning (RL) aims to learn transferable policies under dynamics shifts between source and target domains. A key challenge lies in the lack of target-domain environment interaction and reward supervision, which prevents direct policy learning. To address this challenge, we propose Bridging Dynamics Gaps for Cross-Domain Reinforcement Learning (BDGxRL), a novel framework that leverages Diffusion Schr\"odinger Bridge (DSB) to align source transitions with target-domain dynamics encoded in offline demonstrations. Moreover, we introduce a reward modulation mechanism that estimates rewards based on state transitions, applying to DSB-aligned samples to ensure consistency between rewards and target-domain dynamics. BDGxRL performs target-oriented policy learning entirely within the source domain, without access to the target environment or its rewards. Experiments on MuJoCo cross-domain benchmarks demonstrate that BDGxRL outperforms state-of-the-art baselines and shows strong adaptability under transition dynamics shifts.