🤖 AI Summary
This work addresses the issue of feature collapse in existing Transformer architecture search methods, where weight sharing often impedes subnetworks from learning distinct representations. To mitigate this, the study introduces low-rank adaptation (LoRA) into neural architecture search for the first time, proposing a Mixture-of-LoRA Experts (MoLE) mechanism. MoLE employs a lightweight router that dynamically assigns LoRA experts based on subnetwork architectures and incorporates a grouped initialization strategy to enhance expert diversity. This approach effectively alleviates feature collapse while maintaining computational efficiency. Experimental results demonstrate that the proposed method significantly outperforms current architecture search techniques on ImageNet and multiple transfer learning benchmarks, consistently improving subnetwork performance.
📝 Abstract
Transformer architecture search (TAS) discovers optimal vision transformer (ViT) architectures automatically, reducing human effort to manually design ViTs. However, existing TAS methods suffer from the feature collapse problem, where subnets within a supernet fail to learn subnet-specific features, mainly due to the shared weights in a supernet, limiting the performance of individual subnets. To address this, we propose TAS-LoRA, a novel method that introduces parameter-efficient low-rank adaptation (LoRA) to enable subnet-specific feature learning, while maintaining computational efficiency. TAS-LoRA incorporates a Mixture-of-LoRAExperts (MoLE) strategy, where a lightweight router dynamically assigns LoRA experts based on subnet architectures, and introduces a group-wise router initialization technique to encourage diverse feature learning across experts early in training. Extensive experiments on ImageNet and several transfer learning benchmarks, including CIFAR-10/100, Flowers, CARS, and INAT-19, demonstrate that TAS-LoRA mitigates feature collapse effectively, improving performance over state-of-the-art TAS methods significantly.