Semi-Supervised Masked Autoencoders: Unlocking Vision Transformer Potential with Limited Data

📅 2026-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance degradation of Vision Transformers (ViTs) under limited labeled data by proposing a Semi-Supervised Masked Autoencoder (SSMAE). SSMAE jointly optimizes masked image reconstruction and classification, leveraging both labeled and unlabeled data. A key innovation is a validation-based gating mechanism that activates pseudo-labeling only when the model produces high-confidence and consistent predictions across weakly and strongly augmented views, thereby mitigating confirmation bias. By integrating masked autoencoding, semi-supervised learning, dynamic pseudo-label selection, and consistency regularization, the method significantly outperforms fully supervised ViTs and fine-tuned MAE baselines. Notably, on CIFAR-10 with only 10% labeled data, SSMAE achieves a 9.24% absolute accuracy improvement.

Technology Category

Application Category

📝 Abstract
We address the challenge of training Vision Transformers (ViTs) when labeled data is scarce but unlabeled data is abundant. We propose Semi-Supervised Masked Autoencoder (SSMAE), a framework that jointly optimizes masked image reconstruction and classification using both unlabeled and labeled samples with dynamically selected pseudo-labels. SSMAE introduces a validation-driven gating mechanism that activates pseudo-labeling only after the model achieves reliable, high-confidence predictions that are consistent across both weakly and strongly augmented views of the same image, reducing confirmation bias. On CIFAR-10 and CIFAR-100, SSMAE consistently outperforms supervised ViT and fine-tuned MAE, with the largest gains in low-label regimes (+9.24% over ViT on CIFAR-10 with 10% labels). Our results demonstrate that when pseudo-labels are introduced is as important as how they are generated for data-efficient transformer training. Codes are available at https://github.com/atik666/ssmae.
Problem

Research questions and friction points this paper is trying to address.

Semi-Supervised Learning
Vision Transformers
Limited Labeled Data
Pseudo-Labeling
Masked Autoencoders
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semi-Supervised Learning
Masked Autoencoder
Vision Transformer
Pseudo-Labeling
Data-Efficient Training
🔎 Similar Papers
No similar papers found.