Developing a PET/CT Foundation Model for Cross-Modal Anatomical and Functional Imaging

📅 2025-03-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing AI models for PET/CT multimodal analysis suffer from poor generalizability and reliance on small-scale, task-specific training. Method: We propose the first whole-body PET/CT foundation model, FratMAE (Cross-Fraternal Twin Masked Autoencoder), featuring dual-path Vision Transformers for separate PET and CT encoding, cross-modal cross-attention decoding, and integration of textual metadata to enrich representation learning. It employs masked autoencoding and multimodal contrastive pretraining for end-to-end joint representation learning. Contribution/Results: On downstream tasks—including lesion detection, segmentation, and cancer staging prediction—FratMAE significantly outperforms unimodal and conventional multimodal baselines. It demonstrates superior generalizability across diverse clinical sites and scanners, robustness to modality dropout or noise, and strong potential for real-world clinical deployment.

Technology Category

Application Category

📝 Abstract
In oncology, Positron Emission Tomography-Computed Tomography (PET/CT) is widely used in cancer diagnosis, staging, and treatment monitoring, as it combines anatomical details from CT with functional metabolic activity and molecular marker expression information from PET. However, existing artificial intelligence-driven PET/CT analyses rely predominantly on task-specific models trained from scratch or on limited datasets, limiting their generalizability and robustness. To address this, we propose a foundation model approach specifically designed for multimodal PET/CT imaging. We introduce the Cross-Fraternal Twin Masked Autoencoder (FratMAE), a novel framework that effectively integrates whole-body anatomical and functional or molecular information. FratMAE employs separate Vision Transformer (ViT) encoders for PET and CT scans, along with cross-attention decoders that enable synergistic interactions between modalities during masked autoencoder training. Additionally, it incorporates textual metadata to enhance PET representation learning. By pre-training on PET/CT datasets, FratMAE captures intricate cross-modal relationships and global uptake patterns, achieving superior performance on downstream tasks and demonstrating its potential as a generalizable foundation model.
Problem

Research questions and friction points this paper is trying to address.

Develops a foundation model for PET/CT imaging.
Enhances cross-modal anatomical and functional integration.
Improves generalizability and robustness in cancer diagnostics.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-Fraternal Twin Masked Autoencoder (FratMAE) framework
Separate Vision Transformer encoders for PET/CT
Cross-attention decoders for synergistic modality interaction
🔎 Similar Papers
No similar papers found.
Yujin Oh
Yujin Oh
Harvard Medical School & Massachusetts General Hospital
Medical Image AnalysisArtificial IntelligenceLarge Language ModelMultimodal AI
R
Robert Seifert
Department of Radiology, Massachusetts General Hospital, USA
Yihan Cao
Yihan Cao
LinkedIn
Christoph Clement
Christoph Clement
PhD Student, Inselspital Bern
PETMonolithic CrystalsDeep Learning
J
Justin Ferdinandus
University Hospital Cologne, University of Cologne, Germany
C
Constantin Lapa
University Hospital Augsburg, Germany
A
Alessandro Liebich
University Hospital Augsburg, Germany
M
Michelle Amon
Department of Nuclear Medicine, Inselspital, University of Bern, Switzerland
J
Johanna Enke
University Hospital Augsburg, Germany
Sifan Song
Sifan Song
Post-Doc, Massachusetts General Hospital
Medical Image Analysis
R
Runqi Meng
Shanghaitech University, China
F
Fang Zeng
Center for Advanced Medical Computing and Analysis (CAMCA), Harvard Medical School and Massachusetts General Hospital, USA
N
Ning Guo
Center for Advanced Medical Computing and Analysis (CAMCA), Harvard Medical School and Massachusetts General Hospital, USA
X
Xiang Li
Center for Advanced Medical Computing and Analysis (CAMCA), Harvard Medical School and Massachusetts General Hospital, USA
P
Pedram Heidari
Center for Advanced Medical Computing and Analysis (CAMCA), Harvard Medical School and Massachusetts General Hospital, USA
Axel Rominger
Axel Rominger
Professor of Nuclear Medicine
Nuklearmedizin
Kuangyu Shi
Kuangyu Shi
University of Bern/Technical University of Munich
Nuclear medicine/Biomedical computing
Quanzheng Li
Quanzheng Li
Massachusetts General Hospital, Harvard Medical School
Image ReconstructionMedical Image AnalysisDeep Learning in MedicineMultimodality Medical Data Analysis