Decorrelation Speeds Up Vision Transformers

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Vision Transformers (ViTs) pretrained via Masked Autoencoders (MAEs) achieve strong performance under low-label regimes, yet their high computational cost hinders industrial deployment. To address this, we propose **Selective Decorrelation Backpropagation (DecorrBP)**—a lightweight optimization technique that imposes layer-wise gradient covariance constraints exclusively within the MAE encoder, enhancing gradient propagation efficiency and convergence speed while preserving training stability. Evaluated on ImageNet-1K, DecorrBP reduces pretraining time by 21.1% and carbon emissions by 21.4%. On downstream ADE20K semantic segmentation, it improves mIoU by 1.1 points; consistent gains are also observed on industrial datasets. Crucially, DecorrBP is the first method to integrate gradient decorrelation into the MAE training framework without modifying model architecture or loss functions—enabling efficient, low-carbon, and high-performance ViT pretraining.

Technology Category

Application Category

📝 Abstract

Masked Autoencoder (MAE) pre-training of vision transformers (ViTs) yields strong performance in low-label regimes but comes with substantial computational costs, making it impractical in time- and resource-constrained industrial settings. We address this by integrating Decorrelated Backpropagation (DBP) into MAE pre-training, an optimization method that iteratively reduces input correlations at each layer to accelerate convergence. Applied selectively to the encoder, DBP achieves faster pre-training without loss of stability. On ImageNet-1K pre-training with ADE20K fine-tuning, DBP-MAE reduces wall-clock time to baseline performance by 21.1%, lowers carbon emissions by 21.4% and improves segmentation mIoU by 1.1 points. We observe similar gains when pre-training and fine-tuning on proprietary industrial data, confirming the method's applicability in real-world scenarios. These results demonstrate that DBP can reduce training time and energy use while improving downstream performance for large-scale ViT pre-training.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational costs of ViT pre-training in resource-limited settings

Accelerating MAE convergence while maintaining model stability

Decreasing training time and carbon emissions for vision transformers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Decorrelated Backpropagation to accelerate convergence

Selectively applies DBP to encoder for stable training

Reduces training time and energy while improving performance

🔎 Similar Papers

Efficient Deep Learning with Decorrelated Backpropagation