Semiparametric Learning from Open-Set Label Shift Data

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the open-set label shift problem, where the test distribution contains novel classes absent from training, rendering class proportions and novel-class densities unidentifiable. To tackle this challenge, we propose the first identifiable semiparametric density ratio estimation framework. By introducing an overlap modeling mechanism between novel and known classes, our approach ensures identifiability without strong assumptions or prior knowledge, supported by rigorous theoretical guarantees. The method integrates maximum empirical likelihood estimation, asymptotically efficient confidence interval construction, a stable EM-based optimization algorithm, and a posterior-probability-based approximately optimal classifier. Extensive experiments on synthetic and real-world datasets demonstrate substantial improvements in both class proportion estimation accuracy and classification performance, consistently outperforming state-of-the-art methods across all benchmarks.

Technology Category

Application Category

📝 Abstract
We study the open-set label shift problem, where the test data may include a novel class absent from training. This setting is challenging because both the class proportions and the distribution of the novel class are not identifiable without extra assumptions. Existing approaches often rely on restrictive separability conditions, prior knowledge, or computationally infeasible procedures, and some may lack theoretical guarantees. We propose a semiparametric density ratio model framework that ensures identifiability while allowing overlap between novel and known classes. Within this framework, we develop maximum empirical likelihood estimators and confidence intervals for class proportions, establish their asymptotic validity, and design a stable Expectation-Maximization algorithm for computation. We further construct an approximately optimal classifier based on posterior probabilities with theoretical guarantees. Simulations and a real data application confirm that our methods improve both estimation accuracy and classification performance compared with existing approaches.
Problem

Research questions and friction points this paper is trying to address.

Addresses open-set label shift with novel test classes
Ensures identifiability without restrictive separability conditions
Develops estimators for class proportions with theoretical guarantees
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semiparametric density ratio model framework
Maximum empirical likelihood estimators with confidence intervals
Stable EM algorithm for optimal classifier
🔎 Similar Papers
No similar papers found.
Siyan Liu
Siyan Liu
Nanyang Technological University
Y
Yukun Liu
KLA T ASDS-MOE, School of Statistics, East China Normal University , Shanghai 200062, China
Qinglong Tian
Qinglong Tian
University of Waterloo
statistics
P
Pengfei Li
Department of Statistics and Actuarial Science, University of Waterloo, Ontario N2L 3G1, Canada
Jing Qin
Jing Qin
University of Southern Denmark
MathematicsStatistics