DASViT: Differentiable Architecture Search for Vision Transformer

πŸ“… 2025-07-17
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Vision Transformer (ViT) architectures rely heavily on manual design heuristics, while existing neural architecture search (NAS) methods suffer from prohibitive computational cost and limited architectural novelty. Method: This paper pioneers the application of differentiable architecture search (DARTS) to macro-structural optimization of ViTs. We propose an end-to-end joint training framework that employs continuous relaxation to simultaneously optimize both network weights and architecture parameters, enabling flexible exploration of encoder layer types, inter-layer connectivity patterns, and module configurations. Contribution/Results: The discovered ViT architecture significantly outperforms ViT-B/16 on ImageNet and other benchmarks: it reduces parameter count by 18% and FLOPs by 22%, while achieving higher top-1 accuracy. This demonstrates the proposed method’s dual advantage in both model efficiency and predictive performance, establishing a scalable and effective paradigm for automated ViT design.

Technology Category

Application Category

πŸ“ Abstract
Designing effective neural networks is a cornerstone of deep learning, and Neural Architecture Search (NAS) has emerged as a powerful tool for automating this process. Among the existing NAS approaches, Differentiable Architecture Search (DARTS) has gained prominence for its efficiency and ease of use, inspiring numerous advancements. Since the rise of Vision Transformers (ViT), researchers have applied NAS to explore ViT architectures, often focusing on macro-level search spaces and relying on discrete methods like evolutionary algorithms. While these methods ensure reliability, they face challenges in discovering innovative architectural designs, demand extensive computational resources, and are time-intensive. To address these limitations, we introduce Differentiable Architecture Search for Vision Transformer (DASViT), which bridges the gap in differentiable search for ViTs and uncovers novel designs. Experiments show that DASViT delivers architectures that break traditional Transformer encoder designs, outperform ViT-B/16 on multiple datasets, and achieve superior efficiency with fewer parameters and FLOPs.
Problem

Research questions and friction points this paper is trying to address.

Automate Vision Transformer design via differentiable search
Overcome computational inefficiency in Neural Architecture Search
Discover novel Transformer architectures with fewer parameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Differentiable Architecture Search for Vision Transformer
Bridges gap in differentiable search for ViTs
Achieves efficiency with fewer parameters and FLOPs
πŸ”Ž Similar Papers
No similar papers found.
P
Pengjin Wu
School of Computer Science and Electronic Engineering, University of Surrey, Guildford, UK
Ferrante Neri
Ferrante Neri
Professor of Machine Learning and Artificial Intelligence, NICE group, University of Surrey
Heuristic OptimisationNeural Architecture SearchFeature SelectionMachine LearningP Systems
Z
Zhenhua Feng
School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China