Constraint Multi-class Positive and Unlabeled Learning for Distantly Supervised Named Entity Recognition

πŸ“… 2025-04-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Distantly supervised named entity recognition (DS-NER) suffers from high false-negative rates due to incompleteness of underlying knowledge bases. To address this, we propose the Constrained Multi-class Positive-Unlabeled (CMPU) learning frameworkβ€”a novel approach that, for the first time, incorporates non-negativity constraints into the risk estimator of multi-class PU learning, thereby relaxing the implicit assumption of positive-example completeness inherent in conventional PU methods. Theoretically, this constraint enhances model robustness and mitigates overfitting, while being integrated with explicit modeling of distant supervision noise and risk-minimization optimization. Evaluated on two benchmark datasets annotated via multiple heterogeneous knowledge bases, CMPU consistently outperforms state-of-the-art DS-NER methods, achieving absolute F1-score improvements of 3.2–5.8 percentage points. These results empirically validate both the effectiveness and generalizability of our constrained risk estimation strategy.

Technology Category

Application Category

πŸ“ Abstract
Distantly supervised named entity recognition (DS-NER) has been proposed to exploit the automatically labeled training data by external knowledge bases instead of human annotations. However, it tends to suffer from a high false negative rate due to the inherent incompleteness. To address this issue, we present a novel approach called extbf{C}onstraint extbf{M}ulti-class extbf{P}ositive and extbf{U}nlabeled Learning (CMPU), which introduces a constraint factor on the risk estimator of multiple positive classes. It suggests that the constraint non-negative risk estimator is more robust against overfitting than previous PU learning methods with limited positive data. Solid theoretical analysis on CMPU is provided to prove the validity of our approach. Extensive experiments on two benchmark datasets that were labeled using diverse external knowledge sources serve to demonstrate the superior performance of CMPU in comparison to existing DS-NER methods.
Problem

Research questions and friction points this paper is trying to address.

Reducing false negatives in distantly supervised NER
Handling incomplete labels via multi-class PU learning
Improving robustness against overfitting with constraints
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constraint factor on multi-class risk estimator
Robust non-negative risk estimator against overfitting
Superior performance on diverse benchmark datasets
πŸ”Ž Similar Papers
No similar papers found.
Y
Yuzhe Zhang
School of Management, University of Science and Technology of China, No. 96 Jinzhao Road, Hefei, 230026, Anhui, China.
Min Cen
Min Cen
University of Science and Technology of China
H
Hong Zhang
School of Management, University of Science and Technology of China, No. 96 Jinzhao Road, Hefei, 230026, Anhui, China.