🤖 AI Summary
This paper addresses the problem of multi-source unsupervised domain adaptation, where multiple labeled source domains and a single unlabeled target domain exhibit significant heterogeneity in discrete label-conditional distributions. To tackle this, we propose the Conditional Group Distributionally Robust Optimization (CG-DRO) framework, which minimizes the worst-case cross-entropy loss over convex combinations of source-domain conditional output distributions, thereby enhancing robust cross-domain generalization. Our method innovatively integrates perturbed inference with a double machine learning procedure and solves the resulting surrogate minimax problem via Mirror Prox—overcoming statistical inference failure under nonstandard asymptotics. We establish that the estimator achieves a fast convergence rate, supports uniformally valid inference—even under boundary effects—and enables confidence interval construction and hypothesis testing.
📝 Abstract
In multi-source learning with discrete labels, distributional heterogeneity across domains poses a central challenge to developing predictive models that transfer reliably to unseen domains. We study multi-source unsupervised domain adaptation, where labeled data are drawn from multiple source domains and only unlabeled data from a target domain. To address potential distribution shifts, we propose a novel Conditional Group Distributionally Robust Optimization (CG-DRO) framework that learns a classifier by minimizing the worst-case cross-entropy loss over the convex combinations of the conditional outcome distributions from the sources. To solve the resulting minimax problem, we develop an efficient Mirror Prox algorithm, where we employ a double machine learning procedure to estimate the risk function. This ensures that the errors of the machine learning estimators for the nuisance models enter only at higher-order rates, thereby preserving statistical efficiency under covariate shift. We establish fast statistical convergence rates for the estimator by constructing two surrogate minimax optimization problems that serve as theoretical bridges. A distinguishing challenge for CG-DRO is the emergence of nonstandard asymptotics: the empirical estimator may fail to converge to a standard limiting distribution due to boundary effects and system instability. To address this, we introduce a perturbation-based inference procedure that enables uniformly valid inference, including confidence interval construction and hypothesis testing.