MAESIL: Masked Autoencoder for Enhanced Self-supervised Medical Image Learning

📅 2026-04-01

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Existing self-supervised learning approaches often reduce 3D medical images, such as CT scans, to independent 2D slices, thereby neglecting axial coherence and the rich contextual information inherent in their three-dimensional structure, which limits representational capacity. To address this limitation, this work proposes MAESIL, a novel framework that introduces “super-blocks” as fundamental 3D input units, effectively preserving full spatial context while maintaining computational efficiency. Furthermore, MAESIL incorporates a dual-masking strategy within a 3D masked autoencoder to enhance structural awareness and spatial representation learning. Experimental results on three large-scale public CT datasets demonstrate that MAESIL significantly outperforms baseline methods—including AE, VAE, and VQ-VAE—in reconstruction quality, as measured by PSNR and SSIM metrics.

Technology Category

Application Category

📝 Abstract

Training deep learning models for three-dimensional (3D) medical imaging, such as Computed Tomography (CT), is fundamentally challenged by the scarcity of labeled data. While pre-training on natural images is common, it results in a significant domain shift, limiting performance. Self-Supervised Learning (SSL) on unlabeled medical data has emerged as a powerful solution, but prominent frameworks often fail to exploit the inherent 3D nature of CT scans. These methods typically process 3D scans as a collection of independent 2D slices, an approach that fundamentally discards critical axial coherence and the 3D structural context. To address this limitation, we propose the autoencoder for enhanced self-supervised medical image learning(MAESIL), a novel self-supervised learning framework designed to capture 3D structural information efficiently. The core innovation is the 'superpatch', a 3D chunk-based input unit that balances 3D context preservation with computational efficiency. Our framework partitions the volume into superpatches and employs a 3D masked autoencoder strategy with a dual-masking strategy to learn comprehensive spatial representations. We validated our approach on three diverse large-scale public CT datasets. Our experimental results show that MAESIL demonstrates significant improvements over existing methods such as AE, VAE and VQ-VAE in key reconstruction metrics such as PSNR and SSIM. This establishes MAESIL as a robust and practical pre-training solution for 3D medical imaging tasks.

Problem

Research questions and friction points this paper is trying to address.

3D medical imaging

self-supervised learning

domain shift

structural context

CT scans

Innovation

Methods, ideas, or system contributions that make the work stand out.

Masked Autoencoder

Self-Supervised Learning

3D Medical Imaging