A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs

📅 2025-07-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Low-dose computed tomography (LDCT) analysis for lung cancer screening is hindered by radiologist shortages, high computational costs, proprietary constraints, and poor generalizability of existing AI models. Method: We propose TANGERINE, a lightweight, open-source 3D vision foundation model that pioneers the extension of masked autoencoders to 3D medical imaging. It undergoes self-supervised pretraining on over 98,000 multi-center LDCT scans. Contribution/Results: TANGERINE achieves state-of-the-art performance across 14 respiratory disease classification tasks, demonstrating exceptional label efficiency and rapid fine-tuning—matching or surpassing fully supervised baselines with minimal labeled data. GPU training time is substantially reduced. The model has been successfully deployed in the UK’s largest Lung Cancer Screening (LCS) program and validated on 27 public datasets, providing a scalable, reproducible, open foundation for large-scale early disease detection in resource-constrained settings.

Technology Category

Application Category

📝 Abstract
Low-dose computed tomography (LDCT) imaging employed in lung cancer screening (LCS) programs is increasing in uptake worldwide. LCS programs herald a generational opportunity to simultaneously detect cancer and non-cancer-related early-stage lung disease. Yet these efforts are hampered by a shortage of radiologists to interpret scans at scale. Here, we present TANGERINE, a computationally frugal, open-source vision foundation model for volumetric LDCT analysis. Designed for broad accessibility and rapid adaptation, TANGERINE can be fine-tuned off the shelf for a wide range of disease-specific tasks with limited computational resources and training data. Relative to models trained from scratch, TANGERINE demonstrates fast convergence during fine-tuning, thereby requiring significantly fewer GPU hours, and displays strong label efficiency, achieving comparable or superior performance with a fraction of fine-tuning data. Pretrained using self-supervised learning on over 98,000 thoracic LDCTs, including the UK's largest LCS initiative to date and 27 public datasets, TANGERINE achieves state-of-the-art performance across 14 disease classification tasks, including lung cancer and multiple respiratory diseases, while generalising robustly across diverse clinical centres. By extending a masked autoencoder framework to 3D imaging, TANGERINE offers a scalable solution for LDCT analysis, departing from recent closed, resource-intensive models by combining architectural simplicity, public availability, and modest computational requirements. Its accessible, open-source lightweight design lays the foundation for rapid integration into next-generation medical imaging tools that could transform LCS initiatives, allowing them to pivot from a singular focus on lung cancer detection to comprehensive respiratory disease management in high-risk populations.
Problem

Research questions and friction points this paper is trying to address.

Detect thoracic diseases in LDCT scans efficiently
Address radiologist shortage in large-scale scan interpretation
Provide open-source, low-resource model for diverse disease tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source lightweight model for LDCT analysis
Self-supervised pretraining on diverse datasets
Efficient fine-tuning with minimal GPU resources
🔎 Similar Papers
No similar papers found.
N
Niccolò McConnell
Hawkes Institute, University College London, UK; Department of Computer Science, University College London, UK; Institute of Health Informatics, University College London, UK
P
Pardeep Vasudev
Hawkes Institute, University College London, UK; Department of Computer Science, University College London, UK
D
Daisuke Yamada
Hawkes Institute, University College London, UK; Department of Computer Science, University College London, UK
D
Daryl Cheng
Hawkes Institute, University College London, UK; Department of Respiratory Medicine, University College London, UK
M
Mehran Azimbagirad
Hawkes Institute, University College London, UK; Department of Computer Science, University College London, UK
J
John McCabe
Hawkes Institute, University College London, UK; Lungs for Living Research Centre, UCL Respiratory, University College London, London, UK
S
Shahab Aslani
Hawkes Institute, University College London, UK; Department of Respiratory Medicine, University College London, UK
Ahmed H. Shahin
Ahmed H. Shahin
AstraZeneca
Machine LearningMedical Image Analysis
Y
Yukun Zhou
Hawkes Institute, University College London, UK; Institute of Health Informatics, University College London, UK
T
The SUMMIT Consortium
Summit Consortium authors and affiliations listed at end of file
Andre Altmann
Andre Altmann
UCL
Computational BiologyMachine LearningNeuroimagingimaging genetics
Y
Yipeng Hu
Department of Medical Physics and Biomedical Engineering, University College London, UK
P
Paul Taylor
Institute of Health Informatics, University College London, UK
S
Sam M. Janes
Department of Respiratory Medicine, University College London, UK; Lungs for Living Research Centre, UCL Respiratory, University College London, London, UK
Daniel C. Alexander
Daniel C. Alexander
Professor of Imaging Science, Centre for Medical Image Computing, Department of Computer Science
Computer scienceMachine learningMedical imagingdiffusion MRINeuroscience
Joseph Jacob
Joseph Jacob
University College London
Radiologylung diseasecomputed tomography