MAESIL: Masked Autoencoder for Enhanced Self-supervised Medical Image Learning

📅 2026-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing self-supervised learning approaches often reduce 3D medical images, such as CT scans, to independent 2D slices, thereby neglecting axial coherence and the rich contextual information inherent in their three-dimensional structure, which limits representational capacity. To address this limitation, this work proposes MAESIL, a novel framework that introduces “super-blocks” as fundamental 3D input units, effectively preserving full spatial context while maintaining computational efficiency. Furthermore, MAESIL incorporates a dual-masking strategy within a 3D masked autoencoder to enhance structural awareness and spatial representation learning. Experimental results on three large-scale public CT datasets demonstrate that MAESIL significantly outperforms baseline methods—including AE, VAE, and VQ-VAE—in reconstruction quality, as measured by PSNR and SSIM metrics.
📝 Abstract
Training deep learning models for three-dimensional (3D) medical imaging, such as Computed Tomography (CT), is fundamentally challenged by the scarcity of labeled data. While pre-training on natural images is common, it results in a significant domain shift, limiting performance. Self-Supervised Learning (SSL) on unlabeled medical data has emerged as a powerful solution, but prominent frameworks often fail to exploit the inherent 3D nature of CT scans. These methods typically process 3D scans as a collection of independent 2D slices, an approach that fundamentally discards critical axial coherence and the 3D structural context. To address this limitation, we propose the autoencoder for enhanced self-supervised medical image learning(MAESIL), a novel self-supervised learning framework designed to capture 3D structural information efficiently. The core innovation is the 'superpatch', a 3D chunk-based input unit that balances 3D context preservation with computational efficiency. Our framework partitions the volume into superpatches and employs a 3D masked autoencoder strategy with a dual-masking strategy to learn comprehensive spatial representations. We validated our approach on three diverse large-scale public CT datasets. Our experimental results show that MAESIL demonstrates significant improvements over existing methods such as AE, VAE and VQ-VAE in key reconstruction metrics such as PSNR and SSIM. This establishes MAESIL as a robust and practical pre-training solution for 3D medical imaging tasks.
Problem

Research questions and friction points this paper is trying to address.

3D medical imaging
self-supervised learning
domain shift
structural context
CT scans
Innovation

Methods, ideas, or system contributions that make the work stand out.

Masked Autoencoder
Self-Supervised Learning
3D Medical Imaging
Superpatch
Dual-Masking Strategy
🔎 Similar Papers
K
Kyeonghun Kim
OUTTA
H
Hyeonseok Jung
Chung-Ang University
Y
Youngung Han
Seoul National University
J
Junsu Lim
Sangmyung University
Y
YeonJu Jean
Ewha Womans University
S
Seongbin Park
Seoul National University
E
Eunseob Choi
GIST
H
Hyunsu Go
Seoul National University
S
SeoYoung Ju
Sangmyung University
S
Seohyoung Park
Ewha Womans University
G
Gyeongmin Kim
Chung-Ang University
M
MinJu Kwon
Chung-Ang University
K
KyungSeok Yuh
Dankook University
S
Soo Yong Kim
AI Matics
K
Ken Ying-Kai Liao
NVIDIA ATC, Taiwan
N
Nam-Joon Kim
Seoul National University
Hyuk-Jae Lee
Hyuk-Jae Lee
Seoul National University, Department of Electrical and Computer Engineering
인공지능메모리 아키텍처자율주행영상처리