GrowTAS: Progressive Expansion from Small to Large Subnets for Efficient ViT Architecture Search

📅 2025-12-13

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

Existing Transformer Architecture Search (TAS) methods rely on monolithic supernet training with weight sharing across all subnetworks, causing severe interference—especially degrading small-model performance due to parameter contamination from larger subnetworks. Method: We propose GrowTAS, a progressive expansion framework that starts from small subnetworks and incrementally incorporates larger ones to mitigate weight-sharing conflicts. We further introduce GrowTAS+, which selectively fine-tunes only newly added modules’ parameters, enhancing accuracy of large subnetworks. Both methods operate on a scalable Vision Transformer (ViT) supernet. Contribution/Results: Evaluated across ImageNet, CIFAR-10/100, Flowers, CARS, and iNaturalist-19, GrowTAS(+ ) consistently outperforms state-of-the-art TAS approaches: small subnetworks achieve substantial accuracy gains, while large subnetworks benefit from jointly optimized inference efficiency and accuracy. This work pioneers the integration of subnetwork growth mechanisms with selective parameter updating—breaking the conventional static supernet paradigm.

Technology Category

Application Category

📝 Abstract

Transformer architecture search (TAS) aims to automatically discover efficient vision transformers (ViTs), reducing the need for manual design. Existing TAS methods typically train an over-parameterized network (i.e., a supernet) that encompasses all candidate architectures (i.e., subnets). However, all subnets share the same set of weights, which leads to interference that degrades the smaller subnets severely. We have found that well-trained small subnets can serve as a good foundation for training larger ones. Motivated by this, we propose a progressive training framework, dubbed GrowTAS, that begins with training small subnets and incorporate larger ones gradually. This enables reducing the interference and stabilizing a training process. We also introduce GrowTAS+ that fine-tunes a subset of weights only to further enhance the performance of large subnets. Extensive experiments on ImageNet and several transfer learning benchmarks, including CIFAR-10/100, Flowers, CARS, and INAT-19, demonstrate the effectiveness of our approach over current TAS methods

Problem

Research questions and friction points this paper is trying to address.

Reduces interference in subnet weight sharing

Progressive training from small to large subnets

Enhances large subnet performance via fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive training from small to large subnets

Fine-tuning subset of weights for large subnets

Reducing interference and stabilizing training process

🔎 Similar Papers

No similar papers found.