Harnessing Lightweight Transformer with Contextual Synergic Enhancement for Efficient 3D Medical Image Segmentation

📅 2026-03-24

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses the high computational cost and strong reliance on annotated data in Transformer-based models for 3D medical image segmentation by proposing a lightweight architecture, Light-UNETR, coupled with a context-cooperative enhancement learning strategy. The approach integrates Lightweight Intrinsic-Dimensionality Reduction Attention (LIDR) and Compact Gated Linear Units (CGLU) to jointly leverage extrinsic and intrinsic contextual information, substantially improving both model and data efficiency. Evaluated on left atrium segmentation using only 10% of the labeled training data, the method achieves a 1.43% higher Jaccard index than BCP while reducing FLOPs by 90.8% and model parameters by 85.8%.

Technology Category

Application Category

📝 Abstract

Transformers have shown remarkable performance in 3D medical image segmentation, but their high computational requirements and need for large amounts of labeled data limit their applicability. To address these challenges, we consider two crucial aspects: model efficiency and data efficiency. Specifically, we propose Light-UNETR, a lightweight transformer designed to achieve model efficiency. Light-UNETR features a Lightweight Dimension Reductive Attention (LIDR) module, which reduces spatial and channel dimensions while capturing both global and local features via multi-branch attention. Additionally, we introduce a Compact Gated Linear Unit (CGLU) to selectively control channel interaction with minimal parameters. Furthermore, we introduce a Contextual Synergic Enhancement (CSE) learning strategy, which aims to boost the data efficiency of Transformers. It first leverages the extrinsic contextual information to support the learning of unlabeled data with Attention-Guided Replacement, then applies Spatial Masking Consistency that utilizes intrinsic contextual information to enhance the spatial context reasoning for unlabeled data. Extensive experiments on various benchmarks demonstrate the superiority of our approach in both performance and efficiency. For example, with only 10% labeled data on the Left Atrial Segmentation dataset, our method surpasses BCP by 1.43% Jaccard while drastically reducing the FLOPs by 90.8% and parameters by 85.8%. Code is released at https://github.com/CUHK-AIM-Group/Light-UNETR.

Problem

Research questions and friction points this paper is trying to address.

3D medical image segmentation

computational efficiency

data efficiency

Transformer

labeled data dependency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight Transformer

Contextual Synergic Enhancement

3D Medical Image Segmentation