🤖 AI Summary
Manual CT segmentation of retroperitoneal tumors is time-consuming and yields inaccurate volumetric estimates due to their irregular shapes and proximity to critical anatomical structures. To address this, we propose ViLU-Net—a lightweight U-Net variant integrating xLSTM into the encoder for efficient long-range dependency modeling and Vi-blocks to enhance local-global feature interaction, circumventing the high computational cost of standard Transformers. ViLU-Net is trained and validated on a proprietary retroperitoneal tumor CT dataset and public organ segmentation benchmarks. Experimental results demonstrate that ViLU-Net achieves state-of-the-art performance with a Dice coefficient of 0.89, 37% reduced GPU memory consumption, and 2.1× faster inference speed—outperforming mainstream U-Net variants across all metrics. The source code is publicly available.
📝 Abstract
The retroperitoneum hosts a variety of tumors, including rare benign and malignant types, which pose diagnostic and treatment challenges due to their infrequency and proximity to vital structures. Estimating tumor volume is difficult due to their irregular shapes, and manual segmentation is time-consuming. Automatic segmentation using U-Net and its variants, incorporating Vision Transformer (ViT) elements, has shown promising results but struggles with high computational demands. To address this, architectures like the Mamba State Space Model (SSM) and Extended Long-Short Term Memory (xLSTM) offer efficient solutions by handling long-range dependencies with lower resource consumption. This study evaluates U-Net enhancements, including CNN, ViT, Mamba, and xLSTM, on a new in-house CT dataset and a public organ segmentation dataset. The proposed ViLU-Net model integrates Vi-blocks for improved segmentation. Results highlight xLSTM's efficiency in the U-Net framework. The code is publicly accessible on GitHub.