Johnson-Lindenstrauss Lemma Guided Network for Efficient 3D Medical Segmentation

๐Ÿ“… 2025-09-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Lightweight 3D medical image segmentation faces a fundamental trade-off between efficiency and robustness, particularly under complex anatomical structures and heterogeneous imaging modalities, where feature representations remain fragile. To address this, we propose VeloxSegโ€”a novel dual-stream CNN-Transformer collaborative architecture. It integrates Johnson-Lindenstrauss-guided lightweight convolution (JLC), paired-window attention, spatially decoupled knowledge transfer, and Gram-matrix-driven self-supervised knowledge distillation to enable efficient multimodal feature fusion and highly robust representation learning. Evaluated on mainstream 3D multimodal benchmarks, VeloxSeg achieves a 26% improvement in Dice coefficient over prior lightweight methods. It attains 11ร— higher GPU throughput and 48ร— faster CPU inference speed compared to state-of-the-art lightweight models, significantly breaking through the performance bottleneck of lightweight segmentation architectures.

Technology Category

Application Category

๐Ÿ“ Abstract
Lightweight 3D medical image segmentation remains constrained by a fundamental "efficiency / robustness conflict", particularly when processing complex anatomical structures and heterogeneous modalities. In this paper, we study how to redesign the framework based on the characteristics of high-dimensional 3D images, and explore data synergy to overcome the fragile representation of lightweight methods. Our approach, VeloxSeg, begins with a deployable and extensible dual-stream CNN-Transformer architecture composed of Paired Window Attention (PWA) and Johnson-Lindenstrauss lemma-guided convolution (JLC). For each 3D image, we invoke a "glance-and-focus" principle, where PWA rapidly retrieves multi-scale information, and JLC ensures robust local feature extraction with minimal parameters, significantly enhancing the model's ability to operate with low computational budget. Followed by an extension of the dual-stream architecture that incorporates modal interaction into the multi-scale image-retrieval process, VeloxSeg efficiently models heterogeneous modalities. Finally, Spatially Decoupled Knowledge Transfer (SDKT) via Gram matrices injects the texture prior extracted by a self-supervised network into the segmentation network, yielding stronger representations than baselines at no extra inference cost. Experimental results on multimodal benchmarks show that VeloxSeg achieves a 26% Dice improvement, alongside increasing GPU throughput by 11x and CPU by 48x. Codes are available at https://github.com/JinPLu/VeloxSeg.
Problem

Research questions and friction points this paper is trying to address.

Addresses lightweight 3D medical segmentation efficiency-robustness conflict
Redesigns framework using high-dimensional 3D image characteristics
Enhances multimodal modeling with minimal computational parameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-stream CNN-Transformer with PWA and JLC modules
Modal interaction in multi-scale image-retrieval process
Spatially Decoupled Knowledge Transfer via Gram matrices
๐Ÿ”Ž Similar Papers
No similar papers found.