๐ค AI Summary
This work addresses unsupervised domain adaptation under covariate shift by proposing a pseudo-labeling framework based on kernelized generalized linear models (GLMs). The method generates high-quality pseudo-labels for the target domain through an imputation model constructed from source-domain candidate models trained in batches. Robust model selection is achieved via a two-stage data partitioning strategy combined with ridge-regularized kernelized linear, logistic, and Poisson regressions. Theoretically, the authors establish non-asymptotic excess risk bounds and introduce the notion of โeffective labeled sample sizeโ to explicitly quantify the impact of covariate shift on adaptation performance. Experimental results demonstrate that the proposed approach significantly outperforms source-only baselines on both synthetic and real-world datasets.
๐ Abstract
We propose a principled framework for unsupervised domain adaptation under covariate shift in kernel Generalized Linear Models (GLMs), encompassing kernelized linear, logistic, and Poisson regression with ridge regularization. Our goal is to minimize prediction error in the target domain by leveraging labeled source data and unlabeled target data, despite differences in covariate distributions. We partition the labeled source data into two batches: one for training a family of candidate models, and the other for building an imputation model. This imputation model generates pseudo-labels for the target data, enabling robust model selection. We establish non-asymptotic excess-risk bounds that characterize adaptation performance through an "effective labeled sample size", explicitly accounting for the unknown covariate shift. Experiments on synthetic and real datasets demonstrate consistent performance gains over source-only baselines.