DeNAS-ViT: Data Efficient NAS-Optimized Vision Transformer for Ultrasound Image Segmentation

📅 2024-07-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor image quality and scarce annotations in ultrasound image segmentation—which limit model performance and induce overfitting—this work pioneers the integration of Neural Architecture Search (NAS) into the Vision Transformer framework. We propose a token-level multi-scale architecture search mechanism to automatically optimize feature extraction, and further design an NAS-guided staged semi-supervised learning framework that incorporates network independence constraints and contrastive learning to enhance robustness under limited labeling. The method jointly models local anatomical details and global contextual dependencies, effectively mitigating overfitting in data-scarce regimes. Evaluated on multiple public ultrasound benchmarks, our approach achieves state-of-the-art performance, surpassing fully supervised baselines using only 10% labeled data, and demonstrates promising cross-modality transferability.

Technology Category

Application Category

📝 Abstract
Accurate segmentation of ultrasound images is essential for reliable medical diagnoses but is challenged by poor image quality and scarce labeled data. Prior approaches have relied on manually designed, complex network architectures to improve multi-scale feature extraction. However, such handcrafted models offer limited gains when prior knowledge is inadequate and are prone to overfitting on small datasets. In this paper, we introduce DeNAS-ViT, a data-efficient NAS-optimized Vision Transformer, the first method to leverage neural architecture search (NAS) for ultrasound image segmentation by automatically optimizing model architecture through token-level search. Specifically, we propose an efficient NAS module that performs multi-scale token search prior to the ViT's attention mechanism, effectively capturing both contextual and local features while minimizing computational costs. Given ultrasound's data scarcity and NAS's inherent data demands, we further develop a NAS-guided semi-supervised learning (SSL) framework. This approach integrates network independence and contrastive learning within a stage-wise optimization strategy, significantly enhancing model robustness under limited-data conditions. Extensive experiments on public datasets demonstrate that DeNAS-ViT achieves state-of-the-art performance, maintaining robustness with minimal labeled data. Moreover, we highlight DeNAS-ViT's generalization potential beyond ultrasound imaging, underscoring its broader applicability.
Problem

Research questions and friction points this paper is trying to address.

Automating optimal architecture search for ultrasound image segmentation tasks
Addressing data scarcity challenges in medical imaging through semi-supervised learning
Enhancing multi-scale feature extraction while minimizing computational resource requirements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated neural architecture search for ultrasound segmentation
Multi-scale token search before ViT attention mechanism
NAS-guided semi-supervised learning with contrastive optimization
🔎 Similar Papers
No similar papers found.
Renqi Chen
Renqi Chen
Southern University of Science and Technology, Fudan University
AI4ScienceLarge Language ModelMulti-Modal Language ModelMars Computing
Xinzhe Zheng
Xinzhe Zheng
National University of Singapore
AI for BiomedicineAI for Science
H
Haoyang Su
Australian Institute for Machine Learning, The University of Adelaide, Adelaide, Australia
K
Kehan Wu
Southern University of Science and Technology, Shenzhen, China