Transfer Learning under Group-Label Shift: A Semiparametric Exponential Tilting Approach

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses binary classification under transfer learning where both covariate and label distributions shift between source and target domains. We propose the “group label shift” assumption to model subgroup imbalance and spurious correlations, thereby enhancing robustness to realistic distributional changes. Methodologically, we introduce instrumental variables to establish identifiability, characterize joint distribution discrepancies via a semiparametric exponential tilt model, and extend the theory to non-smooth functional estimation—including ROC curves and AUC. We further develop a two-stage likelihood inference framework: first estimating the conditional likelihood ratio, then integrating it with logistic regression for efficient estimation. On the Waterbirds semi-synthetic benchmark, our method significantly outperforms standard baselines for covariate shift and label shift, achieving substantial improvements in target-domain classification accuracy.

Technology Category

Application Category

📝 Abstract
We propose a new framework for binary classification in transfer learning settings where both covariate and label distributions may shift between source and target domains. Unlike traditional covariate shift or label shift assumptions, we introduce a group-label shift assumption that accommodates subpopulation imbalance and mitigates spurious correlations, thereby improving robustness to real-world distributional changes. To model the joint distribution difference, we adopt a flexible exponential tilting formulation and establish mild, verifiable identification conditions via an instrumental variable strategy. We develop a computationally efficient two-step likelihood-based estimation procedure that combines logistic regression for the source outcome model with conditional likelihood estimation using both source and target covariates. We derive consistency and asymptotic normality for the resulting estimators, and extend the theory to receiver operating characteristic curves, the area under the curve, and other target functionals, addressing the nonstandard challenges posed by plug-in classifiers. Simulation studies demonstrate that our method outperforms existing alternatives under subpopulation shift scenarios. A semi-synthetic application using the waterbirds dataset further confirms the proposed method's ability to transfer information effectively and improve target-domain classification accuracy.
Problem

Research questions and friction points this paper is trying to address.

Addresses binary classification under covariate and label shifts
Mitigates spurious correlations via group-label shift assumption
Improves robustness to real-world subpopulation distribution changes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Group-label shift assumption for robustness
Exponential tilting with instrumental variables
Two-step likelihood-based estimation procedure
🔎 Similar Papers
No similar papers found.
M
Manli Cheng
Department of Statistics and Actuarial Science, University of Waterloo
Subha Maity
Subha Maity
University of Waterloo
Transfer learningDistribution shiftAlgorithmic fairness
Qinglong Tian
Qinglong Tian
University of Waterloo
statistics
P
Pengfei Li
Department of Statistics and Actuarial Science, University of Waterloo