🤖 AI Summary
This work addresses the high computational cost and strong reliance on annotated data in Transformer-based models for 3D medical image segmentation by proposing a lightweight architecture, Light-UNETR, coupled with a context-cooperative enhancement learning strategy. The approach integrates Lightweight Intrinsic-Dimensionality Reduction Attention (LIDR) and Compact Gated Linear Units (CGLU) to jointly leverage extrinsic and intrinsic contextual information, substantially improving both model and data efficiency. Evaluated on left atrium segmentation using only 10% of the labeled training data, the method achieves a 1.43% higher Jaccard index than BCP while reducing FLOPs by 90.8% and model parameters by 85.8%.
📝 Abstract
Transformers have shown remarkable performance in 3D medical image segmentation, but their high computational requirements and need for large amounts of labeled data limit their applicability. To address these challenges, we consider two crucial aspects: model efficiency and data efficiency. Specifically, we propose Light-UNETR, a lightweight transformer designed to achieve model efficiency. Light-UNETR features a Lightweight Dimension Reductive Attention (LIDR) module, which reduces spatial and channel dimensions while capturing both global and local features via multi-branch attention. Additionally, we introduce a Compact Gated Linear Unit (CGLU) to selectively control channel interaction with minimal parameters. Furthermore, we introduce a Contextual Synergic Enhancement (CSE) learning strategy, which aims to boost the data efficiency of Transformers. It first leverages the extrinsic contextual information to support the learning of unlabeled data with Attention-Guided Replacement, then applies Spatial Masking Consistency that utilizes intrinsic contextual information to enhance the spatial context reasoning for unlabeled data. Extensive experiments on various benchmarks demonstrate the superiority of our approach in both performance and efficiency. For example, with only 10% labeled data on the Left Atrial Segmentation dataset, our method surpasses BCP by 1.43% Jaccard while drastically reducing the FLOPs by 90.8% and parameters by 85.8%. Code is released at https://github.com/CUHK-AIM-Group/Light-UNETR.