MultiMatch: Multihead Consistency Regularization Matching for Semi-Supervised Text Classification

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

To address the degradation of model performance in semi-supervised text classification caused by high noise in pseudo-labels and extreme class imbalance, this paper proposes a multi-head consistency regularization matching framework. The core innovation is a novel triple pseudo-label weighting module that integrates multi-head collaborative training, adaptive dynamic thresholding for pseudo-label selection, and average pseudo-boundary difficulty-aware weighting—enabling robust pseudo-label selection, noise filtering, and discriminative weighting. This unified framework enhances both generalization and robustness under noisy and long-tailed label distributions. Extensive experiments across five standard NLP benchmarks and ten imbalance settings demonstrate state-of-the-art (SOTA) performance in nine cases; Friedman test ranking confirms top overall performance; and under highly imbalanced scenarios, the method achieves an average improvement of 3.26% over the second-best approach.

Technology Category

Application Category

📝 Abstract

We introduce MultiMatch, a novel semi-supervised learning (SSL) algorithm combining the paradigms of co-training and consistency regularization with pseudo-labeling. At its core, MultiMatch features a three-fold pseudo-label weighting module designed for three key purposes: selecting and filtering pseudo-labels based on head agreement and model confidence, and weighting them according to the perceived classification difficulty. This novel module enhances and unifies three existing techniques -- heads agreement from Multihead Co-training, self-adaptive thresholds from FreeMatch, and Average Pseudo-Margins from MarginMatch -- resulting in a holistic approach that improves robustness and performance in SSL settings. Experimental results on benchmark datasets highlight the superior performance of MultiMatch, achieving state-of-the-art results on 9 out of 10 setups from 5 natural language processing datasets and ranking first according to the Friedman test among 19 methods. Furthermore, MultiMatch demonstrates exceptional robustness in highly imbalanced settings, outperforming the second-best approach by 3.26% -- and data imbalance is a key factor for many text classification tasks.

Problem

Research questions and friction points this paper is trying to address.

Improving semi-supervised text classification accuracy

Enhancing pseudo-label selection and weighting

Boosting robustness in imbalanced data settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multihead consistency regularization with pseudo-labeling

Three-fold pseudo-label weighting module

Combines heads agreement, self-adaptive thresholds, pseudo-margins

🔎 Similar Papers

No similar papers found.