Semi-Supervised Learning with Balanced Deep Representation Distributions

📅 2026-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of poor pseudo-label quality in semi-supervised text classification, which often stems from inter-class representation distribution bias. To mitigate this issue, the authors propose a novel self-training framework that explicitly balances the angular variance of text representations within each class by integrating an angular margin loss with a Gaussian linear transformation—marking the first use of such a mechanism in this context. This approach effectively alleviates representation bias and significantly enhances pseudo-label accuracy. Empirical results demonstrate that the method substantially outperforms existing semi-supervised approaches, particularly under extreme label scarcity, thereby highlighting the critical role of angular variance balancing across classes in improving classification performance.

Technology Category

Application Category

📝 Abstract
Semi-Supervised Text Classification (SSTC) mainly works under the spirit of self-training. They initialize the deep classifier by training over labeled texts; and then alternatively predict unlabeled texts as their pseudo-labels and train the deep classifier over the mixture of labeled and pseudo-labeled texts. Naturally, their performance is largely affected by the accuracy of pseudo-labels for unlabeled texts. Unfortunately, they often suffer from low accuracy because of the margin bias problem caused by the large difference between representation distributions of labels in SSTC. To alleviate this problem, we apply the angular margin loss, and perform several Gaussian linear transformations to achieve balanced label angle variances, i.e., the variance of label angles of texts within the same label. More accuracy of predicted pseudo-labels can be achieved by constraining all label angle variances balanced, where they are estimated over both labeled and pseudo-labeled texts during self-training loops. With this insight, we propose a novel SSTC method, namely Semi-Supervised Text Classification with Balanced Deep representation Distributions (S2TC-BDD). We implement both multi-class classification and multi-label classification versions of S2TC-BDD by introducing some pseudo-labeling tricks and regularization terms. To evaluate S2 TC-BDD, we compare it against the state-of-the-art SSTC methods. Empirical results demonstrate the effectiveness of S2 TC-BDD, especially when the labeled texts are scarce.
Problem

Research questions and friction points this paper is trying to address.

Semi-Supervised Text Classification
pseudo-labels
representation distribution imbalance
margin bias
deep representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

semi-supervised text classification
angular margin loss
balanced representation distribution
pseudo-labeling
label angle variance
🔎 Similar Papers
No similar papers found.
Changchun Li
Changchun Li
Jilin University
Text ClassificationTopic ModelingWeakly Supervised LearningPartial Label LearningSemi-supervised Learning
Ximing Li
Ximing Li
Jilin university, China; RIKEN AIP, Japan
Weakly-supervised learningMisinformation analysis
B
Bingjie Zhang
College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
Wenting Wang
Wenting Wang
Institute of Computational Cosmology, Durham University
CosmologyGalaxy Formation
J
Jihong Ouyang
College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China