An Open Multi-Center Whole-Body FDG PET/CT Foundation Model for Tumor Segmentation

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

187K/year
🤖 AI Summary
This work addresses the limitations of existing deep learning approaches for PET/CT imaging, which are typically task-specific, trained on single-center data, and employ dual-branch architectures that delay cross-modal interaction, thereby underutilizing the early spatial correspondence between PET and CT. The authors present the first open-source, multi-center foundation model for whole-body FDG PET/CT, integrating 4,997 standardized scans. Their architecture features a hierarchical UNet backbone with channel-wise early feature concatenation, enabling deep fusion of anatomical and metabolic information from the very first layer. To preserve physical plausibility, they introduce a masked autoencoding objective based on zero-mean imputation and a weighted global reconstruction loss that mitigates non-physical intensity discontinuities. Remarkably, with only 10% labeled data, the model matches the lesion segmentation performance of fully supervised baselines, and in 5-shot linear probing, joint pretraining substantially outperforms unimodal approaches, significantly reducing reliance on manual annotations.
📝 Abstract
The synergistic interpretation of anatomical information from computed tomography (CT) and metabolic information from positron emission tomography (PET) is important to oncologic imaging. However, existing deep learning methods for PET/CT remain largely task-specific, are often trained on single-center cohorts, or adopt dual-branch fusion schemes that delay cross-modal interaction and underutilize early spatial correspondence between PET and CT. To address these limitations, we present an open-source, multi-center, whole-body FDG PET/CT foundation model utilizing 4,997 harmonized scans from four public datasets. Our framework employs hierarchical UNet-shaped backbones with early channel-wise concatenation, enabling anatomical and metabolic features to interact from the first embedding layer onward. We further introduce a masked autoencoding objective based on zero-mean imputation, combined with a weighted global reconstruction loss. This design avoids non-physical intensity discontinuities at masked-region boundaries that arise from learnable mask tokens. On downstream AutoPET lesion segmentation, the proposed models demonstrate strong label efficiency: with only 10\% of the labeled training data, they achieve performance comparable to models trained from scratch on the full dataset. Under extreme 5-shot linear probing, joint PET/CT pretraining also achieves higher Dice scores than separated-modality pretraining. This multi-center foundation model demonstrates label efficiency and cross-modality representation learning for PET/CT tumor segmentation. It provides a robust, open-source basis for advancing automated oncologic imaging, significantly reducing the need for large-scale manual annotations in clinical practice.
Problem

Research questions and friction points this paper is trying to address.

PET/CT
tumor segmentation
cross-modality
foundation model
multi-center
Innovation

Methods, ideas, or system contributions that make the work stand out.

early cross-modal fusion
masked autoencoding
zero-mean imputation
label-efficient learning
multi-center foundation model
🔎 Similar Papers
No similar papers found.
Xiaofeng Liu
Xiaofeng Liu
Assistant Professor, Yale University
Trustworthy AIComputer VisionMedical Image AnalysisData ScienceHealth Informatics
Q
Qianru Zhang
Department of Radiology and Biomedical Imaging, Yale Biomedical Imaging Institute, Yale University, New Haven, CT, USA
T
Thibault Marin
Department of Radiology and Biomedical Imaging, Yale Biomedical Imaging Institute, Yale University, New Haven, CT, USA
Menghua Xia
Menghua Xia
Yale University; Fudan University
medical image analysis
C
Chi Liu
Department of Radiology and Biomedical Imaging, Yale Biomedical Imaging Institute, Yale University, New Haven, CT, USA
Georges El Fakhri
Georges El Fakhri
Yale University
Medical Imaging
Jinsong Ouyang
Jinsong Ouyang
Associate Professor, Massachusetts General Hospital and Harvard Medical School
Medical Imaging and Deep Learning