Micro Text Classification Based on Balanced Positive-Unlabeled Learning

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual macro- and micro-level class imbalance arising in text quality control—characterized by sparse negative instances, absent fine-grained labels, and high similarity between positive and negative samples—this paper proposes the Balanced Fine-Grained Positive-Unlabeled (BFGPU) learning framework. Methodologically, it reformulates coarse-grained binary classification as a fine-grained PU learning task; designs a theoretically grounded, two-level imbalance-aware PU loss function; and incorporates pseudo-label rebalancing with dynamic threshold adjustment. Evaluated on multiple public and real-world datasets, BFGPU consistently outperforms state-of-the-art methods, demonstrating superior robustness and accuracy even under extreme class imbalance. The framework effectively mitigates both label scarcity and feature-space ambiguity, enabling reliable fine-grained discrimination without explicit negative supervision.

Technology Category

Application Category

📝 Abstract
In real-world text classification tasks, negative texts often contain a minimal proportion of negative content, which is especially problematic in areas like text quality control, legal risk screening, and sensitive information interception. This challenge manifests at two levels: at the macro level, distinguishing negative texts is difficult due to the high similarity between coarse-grained positive and negative samples; at the micro level, the issue stems from extreme class imbalance and a lack of fine-grained labels. To address these challenges, we propose transforming the coarse-grained positive-negative (PN) classification task into an imbalanced fine-grained positive-unlabeled (PU) classification problem, supported by theoretical analysis. We introduce a novel framework, Balanced Fine-Grained Positive-Unlabeled (BFGPU) learning, which features a unique PU learning loss function that optimizes macro-level performance amidst severe imbalance at the micro level. The framework's performance is further boosted by rebalanced pseudo-labeling and threshold adjustment. Extensive experiments on both public and real-world datasets demonstrate the effectiveness of BFGPU, which outperforms other methods, even in extreme scenarios where both macro and micro levels are highly imbalanced.
Problem

Research questions and friction points this paper is trying to address.

Addresses extreme class imbalance in text classification.
Transforms coarse-grained PN classification to fine-grained PU classification.
Proposes BFGPU framework for improved performance in imbalanced scenarios.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transforms PN classification to PU learning
Introduces BFGPU with unique loss function
Uses rebalanced pseudo-labeling and threshold adjustment
🔎 Similar Papers
No similar papers found.
Lin-Han Jia
Lin-Han Jia
LAMDA Group, Nanjing University
Machine Learning
Lan-Zhe Guo
Lan-Zhe Guo
LAMDA Group, Nanjing University
Machine Learning
Z
Zhi Zhou
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
S
Si-Ye Han
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Z
Zi-Wen Li
Didi Chuxing, Beijing, China
Yu-Feng Li
Yu-Feng Li
Professor, Nanjing University
Machine Learning