Learning from Label Proportions and Covariate-shifted Instances

📅 2024-11-19

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses a novel hybrid setting where covariate shift and label proportion learning (LLP) coexist: the source domain contains instance-level labels but suffers from distributional shift relative to the target domain, while the target domain provides only bag-level label proportions (i.e., the fraction of positive instances per bag). We formally define this “covariate-shift-aware LLP” problem and propose a theoretically grounded domain adaptation framework. Our method jointly optimizes a weighted feature alignment loss to mitigate covariate shift and incorporates bag-level label-proportion constraints as a regularizer to encode proportion priors. We derive a tight generalization error bound for the target domain. Extensive experiments on multiple benchmark datasets demonstrate that our approach significantly outperforms standalone LLP methods, conventional domain adaptation baselines, and existing hybrid approaches—achieving an average 5.2% improvement in prediction accuracy.

Technology Category

Application Category

📝 Abstract

In many applications, especially due to lack of supervision or privacy concerns, the training data is grouped into bags of instances (feature-vectors) and for each bag we have only an aggregate label derived from the instance-labels in the bag. In learning from label proportions (LLP) the aggregate label is the average of the instance-labels in a bag, and a significant body of work has focused on training models in the LLP setting to predict instance-labels. In practice however, the training data may have fully supervised albeit covariate-shifted source data, along with the usual target data with bag-labels, and we wish to train a good instance-level predictor on the target domain. We call this the covariate-shifted hybrid LLP problem. Fully supervised covariate shifted data often has useful training signals and the goal is to leverage them for better predictive performance in the hybrid LLP setting. To achieve this, we develop methods for hybrid LLP which naturally incorporate the target bag-labels along with the source instance-labels, in the domain adaptation framework. Apart from proving theoretical guarantees bounding the target generalization error, we also conduct experiments on several publicly available datasets showing that our methods outperform LLP and domain adaptation baselines as well techniques from previous related work.

Problem

Research questions and friction points this paper is trying to address.

Develop methods for hybrid LLP with covariate-shifted data

Leverage supervised source data to improve target prediction

Bound target generalization error in domain adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverage covariate-shifted source data for LLP

Incorporate target bag-labels and source instance-labels

Prove theoretical guarantees for target generalization error

🔎 Similar Papers

Selective Prediction for Semantic Segmentation using Post-Hoc Confidence Estimation and Its Performance under Distribution Shift