Pseudo-Labeling for Unsupervised Domain Adaptation with Kernel GLMs

๐Ÿ“… 2026-03-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses unsupervised domain adaptation under covariate shift by proposing a pseudo-labeling framework based on kernelized generalized linear models (GLMs). The method generates high-quality pseudo-labels for the target domain through an imputation model constructed from source-domain candidate models trained in batches. Robust model selection is achieved via a two-stage data partitioning strategy combined with ridge-regularized kernelized linear, logistic, and Poisson regressions. Theoretically, the authors establish non-asymptotic excess risk bounds and introduce the notion of โ€œeffective labeled sample sizeโ€ to explicitly quantify the impact of covariate shift on adaptation performance. Experimental results demonstrate that the proposed approach significantly outperforms source-only baselines on both synthetic and real-world datasets.

Technology Category

Application Category

๐Ÿ“ Abstract
We propose a principled framework for unsupervised domain adaptation under covariate shift in kernel Generalized Linear Models (GLMs), encompassing kernelized linear, logistic, and Poisson regression with ridge regularization. Our goal is to minimize prediction error in the target domain by leveraging labeled source data and unlabeled target data, despite differences in covariate distributions. We partition the labeled source data into two batches: one for training a family of candidate models, and the other for building an imputation model. This imputation model generates pseudo-labels for the target data, enabling robust model selection. We establish non-asymptotic excess-risk bounds that characterize adaptation performance through an "effective labeled sample size", explicitly accounting for the unknown covariate shift. Experiments on synthetic and real datasets demonstrate consistent performance gains over source-only baselines.
Problem

Research questions and friction points this paper is trying to address.

unsupervised domain adaptation
covariate shift
kernel GLMs
pseudo-labeling
target prediction error
Innovation

Methods, ideas, or system contributions that make the work stand out.

pseudo-labeling
unsupervised domain adaptation
kernel GLMs
covariate shift
excess-risk bounds
๐Ÿ”Ž Similar Papers
No similar papers found.
N
Nathan Weill
Department of IEOR, Columbia University
Kaizheng Wang
Kaizheng Wang
Columbia University
Machine LearningStatisticsOptimization