🤖 AI Summary
In continuous domain adaptation (CDA), selecting effective intermediate domains to mitigate source-to-target shift is challenging when explicit metadata or labels are unavailable. To address this, we propose an end-to-end framework integrating reinforcement learning (RL) and feature disentanglement. Our method introduces an unsupervised, label-free reward mechanism that dynamically evaluates transfer path quality based on latent domain embedding distances. Concurrently, it disentangles domain-specific and domain-invariant features, jointly optimizing intermediate domain selection and feature adaptation. The framework unifies RL-driven sequential decision-making over intermediate domains, domain-adversarial training, and adaptive feature learning. Experiments on Rotated MNIST and the ADNI dataset demonstrate significant improvements in both predictive accuracy and domain selection efficiency, outperforming state-of-the-art CDA approaches. Key contributions include: (i) a metadata-agnostic RL formulation for intermediate domain selection; (ii) joint optimization of path discovery and feature disentanglement; and (iii) empirical validation of robustness and scalability in real-world medical and synthetic benchmarks.
📝 Abstract
Continuous Domain Adaptation (CDA) effectively bridges significant domain shifts by progressively adapting from the source domain through intermediate domains to the target domain. However, selecting intermediate domains without explicit metadata remains a substantial challenge that has not been extensively explored in existing studies. To tackle this issue, we propose a novel framework that combines reinforcement learning with feature disentanglement to conduct domain path selection in an unsupervised CDA setting. Our approach introduces an innovative unsupervised reward mechanism that leverages the distances between latent domain embeddings to facilitate the identification of optimal transfer paths. Furthermore, by disentangling features, our method facilitates the calculation of unsupervised rewards using domain-specific features and promotes domain adaptation by aligning domain-invariant features. This integrated strategy is designed to simultaneously optimize transfer paths and target task performance, enhancing the effectiveness of domain adaptation processes. Extensive empirical evaluations on datasets such as Rotated MNIST and ADNI demonstrate substantial improvements in prediction accuracy and domain selection efficiency, establishing our method's superiority over traditional CDA approaches.