Neyman-Pearson multiclass classification under label noise via empirical likelihood

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the challenge of class-specific error rate control in multiclass Neyman–Pearson (NP) classification under label noise by proposing a novel empirical likelihood–based approach. The method models the relationship between noisy and true label distributions via an exponential tilting density ratio, and jointly estimates the true posterior probabilities and class priors through an EM algorithm combined with nonparametric inference, yielding consistent and asymptotically normal estimators. These estimates are then used to construct a classifier that satisfies pre-specified class-wise error constraints. To the best of our knowledge, this is the first study to integrate empirical likelihood into the noisy-label multiclass NP classification framework, with theoretical guarantees that the resulting classifier satisfies the NP oracle inequality. Experiments demonstrate that the proposed method achieves near-oracle performance in simulations and substantially outperforms existing approaches that ignore label noise.

Technology Category

Application Category

📝 Abstract

In many classification problems, the costs of misclassifying observations from different classes can be highly unequal. The Neyman-Pearson multiclass classification (NPMC) framework addresses this issue by minimizing a weighted misclassification risk while imposing upper bounds on class-specific error probabilities. Existing NPMC methods typically assume that training labels are correctly observed. In practice, however, labels are often corrupted due to measurement error or annotation, and the effect of such label noise on NPMC procedures remains largely unexplored. We study the NPMC problem when only noisy labels are available in the training data. We propose an empirical likelihood (EL)-based method that relates the distributions of noisy and true labels through an exponential tilting density ratio model. The resulting maximum EL estimators recover the class proportions and posterior probabilities of the clean labels required for error control. We establish consistency, asymptotic normality, and optimal convergence rates for these estimators. Under mild conditions, the resulting classifier satisfies NP oracle inequalities with respect to the true labels asymptotically. An expectation-maximization algorithm computes the maximum EL estimators. Simulations show that the proposed method performs comparably to the oracle classifier under clean labels and substantially improves over procedures that ignore label noise.

Problem

Research questions and friction points this paper is trying to address.

Neyman-Pearson classification

label noise

multiclass classification

error control

empirical likelihood

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neyman-Pearson classification

label noise

empirical likelihood