Improving Representation of High-frequency Components for Medical Foundation Models

๐Ÿ“… 2024-07-19
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 1
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Medical foundation models exhibit inherent limitations in representing high-frequency components and fine anatomical structures, hindering performance on tasks involving complex boundaries and sub-visual features. To address this, we propose Frequency-advanced Representation Autoencoder (Frepa), the first high-frequency-aware self-supervised pretraining framework for medical imaging. Frepa jointly models frequency-domain information via high-frequency masking and low-frequency perturbation, introduces histogram-equalized image maskingโ€”a novel strategy enabling mask-based self-supervision on both Swin Transformers and CNNsโ€”and integrates adversarial learning to enhance high-frequency fidelity. Evaluated across 32 diverse medical downstream tasks, Frepa achieves zero-shot state-of-the-art performance: +15% Dice score for retinal vessel segmentation and +7% IoU for pulmonary nodule detection. Crucially, it significantly improves high-frequency detail reconstruction capability within the learned embedding space.

Technology Category

Application Category

๐Ÿ“ Abstract
Foundation models have recently attracted significant attention for their impressive generalizability across diverse downstream tasks. However, these models are demonstrated to exhibit great limitations in representing high-frequency components and fine-grained details. In many medical imaging tasks, the precise representation of such information is crucial due to the inherently intricate anatomical structures, sub-visual features, and complex boundaries involved. Consequently, the limited representation of prevalent foundation models can result in significant performance degradation or even failure in these tasks. To address these challenges, we propose a novel pretraining strategy, named Frequency-advanced Representation Autoencoder (Frepa). Through high-frequency masking and low-frequency perturbation combined with adversarial learning, Frepa encourages the encoder to effectively represent and preserve high-frequency components in the image embeddings. Additionally, we introduce an innovative histogram-equalized image masking strategy, extending the Masked Autoencoder approach beyond ViT to other architectures such as Swin Transformer and convolutional networks. We develop Frepa across nine medical modalities and validate it on 32 downstream tasks for both 2D images and 3D volume data. Without fine-tuning, Frepa can outperform other self-supervised pretraining methods and, in some cases, even surpasses task-specific trained models. This improvement is particularly significant for tasks involving fine-grained details, such as achieving up to a +15% increase in DSC for retina vessel segmentation and a +7% increase in IoU for lung nodule detection. Further experiments quantitatively reveal that Frepa enables superior high-frequency representations and preservation in the embeddings, underscoring its potential for developing more generalized and universal medical image foundation models.
Problem

Research questions and friction points this paper is trying to address.

Improves high-frequency component representation in medical imaging.
Addresses limitations of foundation models in fine-grained detail preservation.
Enhances performance in medical tasks requiring precise anatomical structures.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency-advanced Representation Autoencoder (Frepa) introduced
High-frequency masking and low-frequency perturbation used
Histogram-equalized image masking strategy applied
๐Ÿ”Ž Similar Papers
Y
Yuetan Chu
Center of Excellence Applications (CoE) on Smart Health, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Yilan Zhang
Yilan Zhang
King Abdullah University of Science and Technology
Computer VisionMedical Image Analysis
Zhongyi Han
Zhongyi Han
Professor, Shandong University
Machine LearningAgentic AIAI for Science
Changchun Yang
Changchun Yang
KAUST
medical image analysisimaging science
L
Longxi Zhou
Center of Excellence Applications (CoE) on Smart Health, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
G
Gongning Luo
Center of Excellence Applications (CoE) on Smart Health, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
X
Xin Gao
Center of Excellence Applications (CoE) on Smart Health, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia