A Data-Efficient Pan-Tumor Foundation Model for Oncology CT Interpretation

📅 2025-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of high-quality annotated data for tumor CT image analysis—constrained by privacy concerns and high annotation costs—this work introduces PASTA, a pan-tumor CT foundation model. Methodologically: (1) we propose PASTA-Gen, the first synthetic framework generating 30,000 high-fidelity CT volumes with pixel-level lesion annotations and paired structured radiology reports; (2) we design a vision-language joint pretraining paradigm integrating cross-modal transfer learning and few-shot fine-tuning. In terms of contributions and results, PASTA unifies support for 46 diverse clinical tasks—including segmentation, detection, staging, survival prediction, and report generation—achieving state-of-the-art performance on 45 tasks (with statistically significant improvements on 35). Remarkably, it attains substantial performance gains using only minimal real-world data. Both the PASTA model and the synthetic dataset are fully open-sourced.

Technology Category

Application Category

📝 Abstract
Artificial intelligence-assisted imaging analysis has made substantial strides in tumor diagnosis and management. Here we present PASTA, a pan-tumor CT foundation model that achieves state-of-the-art performance on 45 of 46 representative oncology tasks -- including lesion segmentation, tumor detection in plain CT, tumor staging, survival prediction, structured report generation, and cross-modality transfer learning, significantly outperforming the second-best models on 35 tasks. This remarkable advancement is driven by our development of PASTA-Gen, an innovative synthetic tumor generation framework that produces a comprehensive dataset of 30,000 CT scans with pixel-level annotated lesions and paired structured reports, encompassing malignancies across ten organs and five benign lesion types. By leveraging this rich, high-quality synthetic data, we overcome a longstanding bottleneck in the development of CT foundation models -- specifically, the scarcity of publicly available, high-quality annotated datasets due to privacy constraints and the substantial labor required for scaling precise data annotation. Encouragingly, PASTA demonstrates exceptional data efficiency with promising practical value, markedly improving performance on various tasks with only a small amount of real-world data. The open release of both the synthetic dataset and PASTA foundation model effectively addresses the challenge of data scarcity, thereby advancing oncological research and clinical translation.
Problem

Research questions and friction points this paper is trying to address.

Develops pan-tumor CT foundation model
Overcomes dataset scarcity in oncology
Enhances CT imaging analysis efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pan-tumor CT foundation model
Synthetic tumor generation framework
Data-efficient oncology CT interpretation
🔎 Similar Papers
No similar papers found.
Wenhui Lei
Wenhui Lei
University of Pennsylvania
AI4HealthArtifical Intelligence
H
Hanyu Chen
Department of Surgical Oncology and General Surgery, Key Laboratory of Precision Diagnosis and Treatment of Gastrointestinal Tumors, Ministry of Education, The First Hospital of China Medical University, Liaoning, P.R. China
Z
Zitian Zhang
Department of Radiology, The First Hospital of China Medical University, Liaoning, P.R. China
L
Luyang Luo
Harvard Medical School, Boston, MA, USA
Q
Qiong Xiao
Department of Surgical Oncology and General Surgery, Key Laboratory of Precision Diagnosis and Treatment of Gastrointestinal Tumors, Ministry of Education, The First Hospital of China Medical University, Liaoning, P.R. China
Yannian Gu
Yannian Gu
Shanghai Jiao Tong University
P
Peng Gao
Department of Surgical Oncology and General Surgery, Key Laboratory of Precision Diagnosis and Treatment of Gastrointestinal Tumors, Ministry of Education, The First Hospital of China Medical University, Liaoning, P.R. China
Y
Yankai Jiang
Shanghai AI Lab, Shanghai, P.R. China
C
Ci Wang
Department of Radiology, The First Hospital of China Medical University, Liaoning, P.R. China
G
Guangtao Wu
Department of Surgical Oncology and General Surgery, Key Laboratory of Precision Diagnosis and Treatment of Gastrointestinal Tumors, Ministry of Education, The First Hospital of China Medical University, Liaoning, P.R. China
T
Tongjia Xu
Department of Surgical Oncology and General Surgery, Key Laboratory of Precision Diagnosis and Treatment of Gastrointestinal Tumors, Ministry of Education, The First Hospital of China Medical University, Liaoning, P.R. China
Y
Yingjie Zhang
Department of Surgical Oncology and General Surgery, Key Laboratory of Precision Diagnosis and Treatment of Gastrointestinal Tumors, Ministry of Education, The First Hospital of China Medical University, Liaoning, P.R. China
X
Xiaofan Zhang
Shanghai Jiao Tong University, Shanghai, P.R. China; Shanghai AI Lab, Shanghai, P.R. China
P
Pranav Rajpurkar
Harvard Medical School, Boston, MA, USA
Shaoting Zhang
Shaoting Zhang
Shanghai AI Lab; SenseTime Research
Medical Image AnalysisComputer VisionFoundation Models
Zhenning Wang
Zhenning Wang
Staff Engineer, Alibaba Cloud
Computer ArchitectureParallel ComputingOperating System