Stochastic Siamese MAE Pretraining for Longitudinal Medical Images

📅 2025-12-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing self-supervised methods (e.g., MAE) lack temporal awareness, hindering effective modeling of disease progression dynamics in longitudinal medical imaging (e.g., OCT/MRI). To address this, we propose a Random Twin Masked Autoencoding framework that reformulates the MAE reconstruction objective as a conditional variational inference task—explicitly capturing the intrinsic uncertainty of disease evolution via random time-difference sampling and joint masked reconstruction. Our approach integrates Siamese Vision Transformers, MAE, and Conditional Variational Autoencoders (CVAEs), abandoning deterministic temporal contrastive paradigms. Evaluated on multi-visit OCT (late-stage age-related macular degeneration) and MRI (Alzheimer’s disease) datasets, the pre-trained model significantly outperforms temporal MAE baselines and general-purpose foundation models on progression prediction tasks, demonstrating effective learning of non-deterministic longitudinal representations.

Technology Category

Application Category

📝 Abstract
Temporally aware image representations are crucial for capturing disease progression in 3D volumes of longitudinal medical datasets. However, recent state-of-the-art self-supervised learning approaches like Masked Autoencoding (MAE), despite their strong representation learning capabilities, lack temporal awareness. In this paper, we propose STAMP (Stochastic Temporal Autoencoder with Masked Pretraining), a Siamese MAE framework that encodes temporal information through a stochastic process by conditioning on the time difference between the 2 input volumes. Unlike deterministic Siamese approaches, which compare scans from different time points but fail to account for the inherent uncertainty in disease evolution, STAMP learns temporal dynamics stochastically by reframing the MAE reconstruction loss as a conditional variational inference objective. We evaluated STAMP on two OCT and one MRI datasets with multiple visits per patient. STAMP pretrained ViT models outperformed both existing temporal MAE methods and foundation models on different late stage Age-Related Macular Degeneration and Alzheimer's Disease progression prediction which require models to learn the underlying non-deterministic temporal dynamics of the diseases.
Problem

Research questions and friction points this paper is trying to address.

Develops a stochastic Siamese MAE to capture temporal dynamics in longitudinal medical images.
Addresses the lack of temporal awareness in self-supervised learning for disease progression.
Improves prediction of non-deterministic disease evolution in Age-Related Macular Degeneration and Alzheimer's.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Siamese MAE framework with stochastic temporal encoding
Conditional variational inference for disease progression modeling
Outperforms existing methods on OCT and MRI datasets
🔎 Similar Papers
No similar papers found.
Taha Emre
Taha Emre
Medical University of Vienna
Medical ImagingComputer VisionDeep Learning
A
Arunava Chakravarty
Department of Ophthalmology and Optometry, Medical University of Vienna, Austria
Thomas Pinetz
Thomas Pinetz
Medical University of Vienna
Generative Adversarial NetworksMedical Image ProcessingOptimizationMathematical Modelling
D
Dmitrii Lachinov
Institute of Artificial Intelligence, Center for Medical Data Science, Medical University of Vienna, Austria
Martin J. Menten
Martin J. Menten
Technical University of Munich
Machine Learning for HealthcareMedical ImagingComputer Vision
H
Hendrik Scholl
Department of Clinical Pharmacology, Medical University of Vienna, Vienna, Austria, and Pallas Kliniken AG, Pallas Klinik Zürich, Zürich, Switzerland, and European Vision Institute, Basel, Basel-Stadt, Switzerland
Sobha Sivaprasad
Sobha Sivaprasad
Professor of Retinal Clinical Research
retina
Daniel Rueckert
Daniel Rueckert
Technical University of Munich and Imperial College London
Machine LearningMedical Image ComputingBiomedical Image AnalysisComputer Vision
A
Andrew Lotery
Faculty of Medicine, University of Southampton, Southampton, Hampshire, United Kingdom
S
Stefan Sacu
Department of Ophthalmology and Optometry, Medical University of Vienna, Austria
U
Ursula Schmidt-Erfurth
Ophthalmic Image Analysis Group (OPTIMA), Medical University of Vienna, Austria
Hrvoje Bogunović
Hrvoje Bogunović
Medical University of Vienna, Austria
Medical Image AnalysisMachine LearningData Science