A Data-Efficient Pan-Tumor Foundation Model for Oncology CT Interpretation

📅 2025-02-10

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

To address the scarcity of high-quality annotated data for tumor CT image analysis—constrained by privacy concerns and high annotation costs—this work introduces PASTA, a pan-tumor CT foundation model. Methodologically: (1) we propose PASTA-Gen, the first synthetic framework generating 30,000 high-fidelity CT volumes with pixel-level lesion annotations and paired structured radiology reports; (2) we design a vision-language joint pretraining paradigm integrating cross-modal transfer learning and few-shot fine-tuning. In terms of contributions and results, PASTA unifies support for 46 diverse clinical tasks—including segmentation, detection, staging, survival prediction, and report generation—achieving state-of-the-art performance on 45 tasks (with statistically significant improvements on 35). Remarkably, it attains substantial performance gains using only minimal real-world data. Both the PASTA model and the synthetic dataset are fully open-sourced.

Technology Category

Application Category

📝 Abstract

Artificial intelligence-assisted imaging analysis has made substantial strides in tumor diagnosis and management. Here we present PASTA, a pan-tumor CT foundation model that achieves state-of-the-art performance on 45 of 46 representative oncology tasks -- including lesion segmentation, tumor detection in plain CT, tumor staging, survival prediction, structured report generation, and cross-modality transfer learning, significantly outperforming the second-best models on 35 tasks. This remarkable advancement is driven by our development of PASTA-Gen, an innovative synthetic tumor generation framework that produces a comprehensive dataset of 30,000 CT scans with pixel-level annotated lesions and paired structured reports, encompassing malignancies across ten organs and five benign lesion types. By leveraging this rich, high-quality synthetic data, we overcome a longstanding bottleneck in the development of CT foundation models -- specifically, the scarcity of publicly available, high-quality annotated datasets due to privacy constraints and the substantial labor required for scaling precise data annotation. Encouragingly, PASTA demonstrates exceptional data efficiency with promising practical value, markedly improving performance on various tasks with only a small amount of real-world data. The open release of both the synthetic dataset and PASTA foundation model effectively addresses the challenge of data scarcity, thereby advancing oncological research and clinical translation.

Problem

Research questions and friction points this paper is trying to address.

Develops pan-tumor CT foundation model

Overcomes dataset scarcity in oncology

Enhances CT imaging analysis efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pan-tumor CT foundation model

Synthetic tumor generation framework

Data-efficient oncology CT interpretation

🔎 Similar Papers

No similar papers found.