Recursive Block-Diagonal Coupling for Resource-Efficient Training of Vision Models

📅 2026-05-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

211K/year
🤖 AI Summary
Training high-capacity vision models is computationally expensive, and existing model-growing approaches rely on pre-trained narrow models, failing to accurately reflect the total training cost. This work proposes a Recursive Block-Diagonal Coupling (RBDC) training protocol that recursively integrates independently trained narrow models into a high-performance wide model via a parameter-free block-diagonal structure. The method incurs no additional parameters and substantially reduces training FLOPs—achieving a 30% reduction in computational cost on ImageNet compared to standard from-scratch training while maintaining comparable accuracy. Under identical computational budgets, RBDC outperforms current model-growing strategies and serves as a superior backbone, enhancing performance in downstream tasks such as object detection and instance segmentation.
📝 Abstract
Training high-capacity vision models from scratch requires substantial computational resources. To improve training efficiency of a wide target model, existing growth methods often assume the availability of narrower models, obscuring the true computational cost of the entire pipeline. We propose an efficient training protocol, RBDC, that builds wide models by coupling in a parameter-free block-diagonal way narrower, independently trained models in a recursive way. This allows a flexible allocation of the training budget available across all the models involved. Evaluated with vision transformers (DeiT) and convolutional networks (ResNet) on ImageNet, our RBDC training protocol shows a much better efficiency than models trained from scratch with the standard protocol, yielding 30% FLOPs reduction at similar test accuracies. It also achieves higher performances at same training FLOPs than training protocols from the model growth literature. Finally, we show that our models can serve as better backbones than their original counterparts for downstream object detection and instance segmentation tasks.
Problem

Research questions and friction points this paper is trying to address.

resource-efficient training
vision models
computational cost
model growth
training efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Recursive Block-Diagonal Coupling
Resource-Efficient Training
Model Growth
Vision Transformers
Training Efficiency