Semiparametric Learning from Open-Set Label Shift Data

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This paper addresses the open-set label shift problem, where the test distribution contains novel classes absent from training, rendering class proportions and novel-class densities unidentifiable. To tackle this challenge, we propose the first identifiable semiparametric density ratio estimation framework. By introducing an overlap modeling mechanism between novel and known classes, our approach ensures identifiability without strong assumptions or prior knowledge, supported by rigorous theoretical guarantees. The method integrates maximum empirical likelihood estimation, asymptotically efficient confidence interval construction, a stable EM-based optimization algorithm, and a posterior-probability-based approximately optimal classifier. Extensive experiments on synthetic and real-world datasets demonstrate substantial improvements in both class proportion estimation accuracy and classification performance, consistently outperforming state-of-the-art methods across all benchmarks.

Technology Category

Application Category

📝 Abstract

We study the open-set label shift problem, where the test data may include a novel class absent from training. This setting is challenging because both the class proportions and the distribution of the novel class are not identifiable without extra assumptions. Existing approaches often rely on restrictive separability conditions, prior knowledge, or computationally infeasible procedures, and some may lack theoretical guarantees. We propose a semiparametric density ratio model framework that ensures identifiability while allowing overlap between novel and known classes. Within this framework, we develop maximum empirical likelihood estimators and confidence intervals for class proportions, establish their asymptotic validity, and design a stable Expectation-Maximization algorithm for computation. We further construct an approximately optimal classifier based on posterior probabilities with theoretical guarantees. Simulations and a real data application confirm that our methods improve both estimation accuracy and classification performance compared with existing approaches.

Problem

Research questions and friction points this paper is trying to address.

Addresses open-set label shift with novel test classes

Ensures identifiability without restrictive separability conditions

Develops estimators for class proportions with theoretical guarantees

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semiparametric density ratio model framework

Maximum empirical likelihood estimators with confidence intervals

Stable EM algorithm for optimal classifier

🔎 Similar Papers

OpenSlot: Mixed Open-set Recognition with Object-centric Learning