FreeTumor: Large-Scale Generative Tumor Synthesis in Computed Tomography Images for Improving Tumor Recognition

📅 2025-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Clinical adoption of AI for tumor detection in CT imaging is hindered by severe scarcity of high-quality annotated data. Method: This paper introduces the first large-scale generative framework for synthetic tumor synthesis in CT, integrating weakly supervised and self-supervised learning to jointly leverage minimal labeled data and abundant unlabeled CT scans. It establishes the largest multi-source CT synthesis benchmark to date (161,000 cases) and employs a diffusion-based conditional generative architecture incorporating anatomical structure-guided mask modeling and multi-scale morphological priors to ensure anatomical fidelity and photorealism. Results: Synthetic data expand the training set by over 40×, yielding significant performance gains across multiple tumor detection tasks—surpassing state-of-the-art methods. In a radiologist visual Turing test, human observers achieved only 60.8% discrimination accuracy—near chance level—demonstrating clinical indistinguishability of the synthesized tumors.

Technology Category

Application Category

📝 Abstract
Tumor is a leading cause of death worldwide, with an estimated 10 million deaths attributed to tumor-related diseases every year. AI-driven tumor recognition unlocks new possibilities for more precise and intelligent tumor screening and diagnosis. However, the progress is heavily hampered by the scarcity of annotated datasets, which demands extensive annotation efforts by radiologists. To tackle this challenge, we introduce FreeTumor, an innovative Generative AI (GAI) framework to enable large-scale tumor synthesis for mitigating data scarcity. Specifically, FreeTumor effectively leverages a combination of limited labeled data and large-scale unlabeled data for tumor synthesis training. Unleashing the power of large-scale data, FreeTumor is capable of synthesizing a large number of realistic tumors on images for augmenting training datasets. To this end, we create the largest training dataset for tumor synthesis and recognition by curating 161,310 publicly available Computed Tomography (CT) volumes from 33 sources, with only 2.3% containing annotated tumors. To validate the fidelity of synthetic tumors, we engaged 13 board-certified radiologists in a Visual Turing Test to discern between synthetic and real tumors. Rigorous clinician evaluation validates the high quality of our synthetic tumors, as they achieved only 51.1% sensitivity and 60.8% accuracy in distinguishing our synthetic tumors from real ones. Through high-quality tumor synthesis, FreeTumor scales up the recognition training datasets by over 40 times, showcasing a notable superiority over state-of-the-art AI methods including various synthesis methods and foundation models. These findings indicate promising prospects of FreeTumor in clinical applications, potentially advancing tumor treatments and improving the survival rates of patients.
Problem

Research questions and friction points this paper is trying to address.

Generative AI for tumor synthesis
Mitigating annotated data scarcity
Enhancing tumor recognition accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative AI for tumor synthesis
Combines labeled and unlabeled data
Scales training datasets effectively
🔎 Similar Papers
No similar papers found.
L
Linshan Wu
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China.
Jiaxin Zhuang
Jiaxin Zhuang
PhD in CSE, HKUST
Computer VisionMedical Image AnalysisArtificial Intelligence
Yanning Zhou
Yanning Zhou
XPENG
computer visionmedical image analysis
Sunan He
Sunan He
Hong Kong University of Science and Technology
Multi-Modal Learning
J
Jiabo Ma
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China.
L
Luyang Luo
Department of Biomedical Informatics, Harvard University, Boston, USA.
X
Xi Wang
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China.
X
Xuefeng Ni
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China.
X
Xiaoling Zhong
Department of Radiology, Shenzhen People’s Hospital, Shenzhen, China.
M
Mingxiang Wu
Department of Radiology, Shenzhen People’s Hospital, Shenzhen, China.
Y
Yinghua Zhao
Department of Radiology, The Third Affiliated Hospital of Southern Medical University, Guangzhou, China.
X
Xiaohui Duan
Department of Radiology, Sun Yat-Sen Memorial Hospital, Sun Yat-Sen University, Guangzhou, China.
V
Varut Vardhanabhuti
Department of Diagnostic Radiology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong, China.
P
Pranav Rajpurkar
Department of Biomedical Informatics, Harvard University, Boston, USA.
H
Hao Chen
Department of Computer Science and Engineering, The Hong Kong University of Science and Technology, Hong Kong, China., Department of Chemical and Biological Engineering, The Hong Kong University of Science and Technology, Hong Kong, China., Division of Life Science, The Hong Kong University of Science and Technology, Hong Kong, China., State Key Laboratory of Molecular Neuroscience, The Hong Kong University of Science and Technology, Hong Kong, China., Shenzhen-Hong Kong Collaborative Innovation Research In