Micro Text Classification Based on Balanced Positive-Unlabeled Learning

📅 2025-03-17

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

To address the dual macro- and micro-level class imbalance arising in text quality control—characterized by sparse negative instances, absent fine-grained labels, and high similarity between positive and negative samples—this paper proposes the Balanced Fine-Grained Positive-Unlabeled (BFGPU) learning framework. Methodologically, it reformulates coarse-grained binary classification as a fine-grained PU learning task; designs a theoretically grounded, two-level imbalance-aware PU loss function; and incorporates pseudo-label rebalancing with dynamic threshold adjustment. Evaluated on multiple public and real-world datasets, BFGPU consistently outperforms state-of-the-art methods, demonstrating superior robustness and accuracy even under extreme class imbalance. The framework effectively mitigates both label scarcity and feature-space ambiguity, enabling reliable fine-grained discrimination without explicit negative supervision.

Technology Category

Application Category

📝 Abstract

In real-world text classification tasks, negative texts often contain a minimal proportion of negative content, which is especially problematic in areas like text quality control, legal risk screening, and sensitive information interception. This challenge manifests at two levels: at the macro level, distinguishing negative texts is difficult due to the high similarity between coarse-grained positive and negative samples; at the micro level, the issue stems from extreme class imbalance and a lack of fine-grained labels. To address these challenges, we propose transforming the coarse-grained positive-negative (PN) classification task into an imbalanced fine-grained positive-unlabeled (PU) classification problem, supported by theoretical analysis. We introduce a novel framework, Balanced Fine-Grained Positive-Unlabeled (BFGPU) learning, which features a unique PU learning loss function that optimizes macro-level performance amidst severe imbalance at the micro level. The framework's performance is further boosted by rebalanced pseudo-labeling and threshold adjustment. Extensive experiments on both public and real-world datasets demonstrate the effectiveness of BFGPU, which outperforms other methods, even in extreme scenarios where both macro and micro levels are highly imbalanced.

Problem

Research questions and friction points this paper is trying to address.

Addresses extreme class imbalance in text classification.

Transforms coarse-grained PN classification to fine-grained PU classification.

Proposes BFGPU framework for improved performance in imbalanced scenarios.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transforms PN classification to PU learning

Introduces BFGPU with unique loss function

Uses rebalanced pseudo-labeling and threshold adjustment

🔎 Similar Papers

No similar papers found.