🤖 AI Summary
This work addresses the challenge of lacking identifiability guarantees in unsupervised multi-source domain adaptation with high-dimensional data by proposing a general domain adaptation framework grounded in Markov blanket structure. The method learns compact latent representations that capture task-relevant distribution shifts and leverages the Markov blanket of the label—comprising its parents, children, and spouses—to guide identifiable representation learning. It reveals for the first time that representations relying solely on complete predictive information are underdetermined under general settings, thereby circumventing strong assumptions commonly required by existing approaches, such as independent latent variables or invariant label distributions. By integrating causal representation learning with nonparametric deep models, the framework achieves theoretically guaranteed identifiability and significantly improves target-domain generalization across diverse distribution shift scenarios.
📝 Abstract
A central problem in unsupervised domain adaptation is determining what to transfer from labeled source domains to an unlabeled target domain. To handle high-dimensional observations (e.g., images), a line of approaches use deep learning to learn latent representations of the observations, which facilitate knowledge transfer in the latent space. However, existing approaches often rely on restrictive assumptions to establish identifiability of the joint distribution in the target domain, such as independent latent variables or invariant label distributions, limiting their real-world applicability. In this work, we propose a general domain adaptation framework that learns compact latent representations to capture distribution shifts relative to the prediction task and address the fundamental question of what representations should be learned and transferred. Notably, we first demonstrate that learning representations based on all the predictive information, i.e., the label's Markov blanket in terms of the learned representations, is often underspecified in general settings. Instead, we show that, interestingly, general domain adaptation can be achieved by partitioning the representations of Markov blanket into those of the label's parents, children, and spouses. Moreover, its identifiability guarantee can be established. Building on these theoretical insights, we develop a practical, nonparametric approach for domain adaptation in a general setting, which can handle different types of distribution shifts.