Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain

📅 2024-02-09
🏛️ Knowledge Discovery and Data Mining
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual bottlenecks of scarce annotated data and high computational overhead in medical 3D image analysis, this paper proposes LoGoNet—a lightweight, efficient U-shaped network—and an accompanying 3D self-supervised learning framework. Methodologically, we introduce a novel feature extractor integrating Large Kernel Attention (LKA) with a dual-path encoder; design a multi-task self-supervised paradigm jointly leveraging masked reconstruction and contrastive learning to substantially reduce annotation dependency; and support unified pre-training for both ViT- and CNN-based backbones. Evaluated on BTCV and MSD benchmarks, LoGoNet outperforms eight state-of-the-art models, achieving segmentation accuracy gains of 2.3–5.1% and accelerating inference by 37–62%. Notably, it significantly enhances robustness for irregular organ segmentation—e.g., spleen—demonstrating strong generalization under limited supervision.

Technology Category

Application Category

📝 Abstract
Standard modern machine-learning-based imaging methods have faced challenges in medical applications due to the high cost of dataset construction and, thereby, the limited labeled training data available. Additionally, upon deployment, these methods are usually used to process a large volume of data on a daily basis, imposing a high maintenance cost on medical facilities. In this paper, we introduce a new neural network architecture, termed LoGoNet, with a tailored self-supervised learning (SSL) method to mitigate such challenges. LoGoNet integrates a novel feature extractor within a U-shaped architecture, leveraging Large Kernel Attention (LKA) and a dual encoding strategy to capture both long-range and short-range feature dependencies adeptly. This is in contrast to existing methods that rely on increasing network capacity to enhance feature extraction. This combination of novel techniques in our model is especially beneficial in medical image segmentation, given the difficulty of learning intricate and often irregular body organ shapes, such as the spleen. Complementary, we propose a novel SSL method tailored for 3D images to compensate for the lack of large labeled datasets. The method combines masking and contrastive learning techniques within a multi-task learning framework and is compatible with both Vision Transformer (ViT) and CNN-based models. We demonstrate the efficacy of our methods in numerous tasks across two standard datasets (i.e., BTCV and MSD). Benchmark comparisons with eight state-of-the-art models highlight LoGoNet's superior performance in both inference time and accuracy.
Problem

Research questions and friction points this paper is trying to address.

Addresses high cost and limited labeled data in medical imaging.
Reduces maintenance costs for processing large medical datasets.
Improves 3D medical image segmentation accuracy and efficiency.
Innovation

Methods, ideas, or system contributions that make the work stand out.

LoGoNet integrates U-shaped architecture with LKA.
Self-supervised learning tailored for 3D medical images.
Combines masking and contrastive learning in multi-task framework.
🔎 Similar Papers
Amin Karimi Monsefi
Amin Karimi Monsefi
Ph.D. student at The Ohio State University
Computer VisionGenerative AIDiffusion Models
P
Payam Karisani
University of Illinois at Urbana-Champaign, USA
M
Mengxi Zhou
The Ohio State University, Columbus, Ohio, USA
S
Stacey S. Choi
The Ohio State University, Columbus, Ohio, USA
Nathan Doble
Nathan Doble
The Ohio State University, Columbus, Ohio, USA
Heng Ji
Heng Ji
Professor of Computer Science, AICE Director, ASKS Director, UIUC, Amazon Scholar
Natural Language ProcessingLarge Language Models
S
Srinivasan Parthasarathy
The Ohio State University, Columbus, Ohio, USA
R
R. Ramnath
The Ohio State University, Columbus, Ohio, USA