Leveraging the Structure of Medical Data for Improved Representation Learning

📅 2025-07-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address weak model generalization caused by scarce annotations and limited sample sizes (e.g., MIMIC-CXR) in medical imaging, this paper proposes a structure-aware self-supervised pretraining framework. Methodologically, it treats paired posteroanterior (PA) and lateral chest radiographs as natural positive samples for contrastive learning—eliminating reliance on textual annotations—and integrates masked image modeling, cross-view representation alignment, and sparse image patch reconstruction to enable lightweight, modality-agnostic domain adaptation. The core contribution lies in explicitly encoding intrinsic geometric and anatomical structural constraints inherent in multi-view medical images. Experiments demonstrate that the proposed approach significantly outperforms both supervised baselines and unstructured self-supervised methods on MIMIC-CXR, validating the critical role of structural priors in improving representation learning for small-sample medical AI.

Technology Category

Application Category

📝 Abstract
Building generalizable medical AI systems requires pretraining strategies that are data-efficient and domain-aware. Unlike internet-scale corpora, clinical datasets such as MIMIC-CXR offer limited image counts and scarce annotations, but exhibit rich internal structure through multi-view imaging. We propose a self-supervised framework that leverages the inherent structure of medical datasets. Specifically, we treat paired chest X-rays (i.e., frontal and lateral views) as natural positive pairs, learning to reconstruct each view from sparse patches while aligning their latent embeddings. Our method requires no textual supervision and produces informative representations. Evaluated on MIMIC-CXR, we show strong performance compared to supervised objectives and baselines being trained without leveraging structure. This work provides a lightweight, modality-agnostic blueprint for domain-specific pretraining where data is structured but scarce
Problem

Research questions and friction points this paper is trying to address.

Improving medical AI with data-efficient pretraining strategies
Leveraging multi-view X-ray structure for self-supervised learning
Enhancing representation learning in scarce annotated medical datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised framework using multi-view X-rays
Reconstructs views from sparse patches
Aligns latent embeddings without text supervision
🔎 Similar Papers
No similar papers found.
A
Andrea Agostini
Department of Computer Science, ETH Zurich, Zurich, Switzerland
Sonia Laguna
Sonia Laguna
PhD student, ETH Zürich
Machine LearningGenerative ModelsInterpretability
Alain Ryser
Alain Ryser
PhD Student, ETH
Computer ScienceMedical Data ScienceMachine Learning
S
Samuel Ruiperez-Campillo
Department of Computer Science, ETH Zurich, Zurich, Switzerland
Moritz Vandenhirtz
Moritz Vandenhirtz
PhD student, ETH Zurich
Generative ModelingInterpretable Machine LearningComputer VisionMedical Data Science
N
Nicolas Deperrois
Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
F
Farhad Nooralahzadeh
Department of Quantitative Biomedicine, University of Zurich, Zurich, Switzerland
Michael Krauthammer
Michael Krauthammer
University of Zurich
Biomedical Informatics
Thomas M. Sutter
Thomas M. Sutter
Postdoc, ETH Zurich
Generative ModelsMultimodal MLProbabilistic MLRepresentation LearningML for Healthcare
J
Julia E. Vogt
Department of Computer Science, ETH Zurich, Zurich, Switzerland