Geometrical Cross-Attention and Nonvoid Voxelization for Efficient 3D Medical Image Segmentation

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of balancing accuracy and efficiency in 3D medical image segmentation across multi-anatomical structures and multimodal scenarios. The authors propose GCNV-Net, which introduces a novel tri-directional dynamic non-empty voxel Transformer (3DNVT) to model 3D spatial dependencies. It further incorporates geometric cross-attention (GCA) to explicitly fuse geometric positional information for effective multi-scale feature integration and adopts a non-empty voxelization strategy that processes only information-dense regions, substantially reducing computational redundancy. Evaluated on five benchmarks including BraTS2021, GCNV-Net achieves state-of-the-art performance, improving Dice, IoU, and NSD by 0.65%, 0.63%, and 1%, respectively, while decreasing HD95 by 14.5%. Moreover, it reduces FLOPs by 56.13% and inference latency by 68.49%.
📝 Abstract
Accurate segmentation of 3D medical scans is crucial for clinical diagnostics and treatment planning, yet existing methods often fail to achieve both high accuracy and computational efficiency across diverse anatomies and imaging modalities. To address these challenges, we propose GCNV-Net, a novel 3D medical segmentation framework that integrates a Tri-directional Dynamic Nonvoid Voxel Transformer (3DNVT), a Geometrical Cross-Attention module (GCA), and Nonvoid Voxelization. The 3DNVT dynamically partitions relevant voxels along the three orthogonal anatomical planes, namely the transverse, sagittal, and coronal planes, enabling effective modeling of complex 3D spatial dependencies. The GCA mechanism explicitly incorporates geometric positional information during multi-scale feature fusion, significantly enhancing fine-grained anatomical segmentation accuracy. Meanwhile, Nonvoid Voxelization processes only informative regions, greatly reducing redundant computation without compromising segmentation quality, and achieves a 56.13% reduction in FLOPs and a 68.49% reduction in inference latency compared to conventional voxelization. We evaluate GCNV-Net on multiple widely used benchmarks: BraTS2021, ACDC, MSD Prostate, MSD Pancreas, and AMOS2022. Our method achieves state-of-the-art segmentation performance across all datasets, outperforming the best existing methods by 0.65% on Dice, 0.63% on IoU, 1% on NSD, and relatively 14.5% on HD95. All results demonstrate that GCNV-Net effectively balances accuracy and efficiency, and its robustness across diverse organs, disease conditions, and imaging modalities highlights strong potential for clinical deployment.
Problem

Research questions and friction points this paper is trying to address.

3D medical image segmentation
computational efficiency
anatomical accuracy
imaging modalities
voxel processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometrical Cross-Attention
Nonvoid Voxelization
3D Medical Image Segmentation
Dynamic Voxel Transformer
Efficient Inference
🔎 Similar Papers
No similar papers found.
C
Chenxin Yuan
Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen, 518100, Guangdong, China
S
Shoupeng Chen
Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen, 518100, Guangdong, China
H
Haojiang Ye
School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, Shenzhen, 518172, Guangdong, China
Yiming Miao
Yiming Miao
Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China
Limei Peng
Limei Peng
Kyungpook National University, South Korea
Pin-Han Ho
Pin-Han Ho
University of Waterloo
computer networks