A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs

📅 2025-07-02

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Low-dose computed tomography (LDCT) analysis for lung cancer screening is hindered by radiologist shortages, high computational costs, proprietary constraints, and poor generalizability of existing AI models. Method: We propose TANGERINE, a lightweight, open-source 3D vision foundation model that pioneers the extension of masked autoencoders to 3D medical imaging. It undergoes self-supervised pretraining on over 98,000 multi-center LDCT scans. Contribution/Results: TANGERINE achieves state-of-the-art performance across 14 respiratory disease classification tasks, demonstrating exceptional label efficiency and rapid fine-tuning—matching or surpassing fully supervised baselines with minimal labeled data. GPU training time is substantially reduced. The model has been successfully deployed in the UK’s largest Lung Cancer Screening (LCS) program and validated on 27 public datasets, providing a scalable, reproducible, open foundation for large-scale early disease detection in resource-constrained settings.

Technology Category

Application Category

📝 Abstract

Low-dose computed tomography (LDCT) imaging employed in lung cancer screening (LCS) programs is increasing in uptake worldwide. LCS programs herald a generational opportunity to simultaneously detect cancer and non-cancer-related early-stage lung disease. Yet these efforts are hampered by a shortage of radiologists to interpret scans at scale. Here, we present TANGERINE, a computationally frugal, open-source vision foundation model for volumetric LDCT analysis. Designed for broad accessibility and rapid adaptation, TANGERINE can be fine-tuned off the shelf for a wide range of disease-specific tasks with limited computational resources and training data. Relative to models trained from scratch, TANGERINE demonstrates fast convergence during fine-tuning, thereby requiring significantly fewer GPU hours, and displays strong label efficiency, achieving comparable or superior performance with a fraction of fine-tuning data. Pretrained using self-supervised learning on over 98,000 thoracic LDCTs, including the UK's largest LCS initiative to date and 27 public datasets, TANGERINE achieves state-of-the-art performance across 14 disease classification tasks, including lung cancer and multiple respiratory diseases, while generalising robustly across diverse clinical centres. By extending a masked autoencoder framework to 3D imaging, TANGERINE offers a scalable solution for LDCT analysis, departing from recent closed, resource-intensive models by combining architectural simplicity, public availability, and modest computational requirements. Its accessible, open-source lightweight design lays the foundation for rapid integration into next-generation medical imaging tools that could transform LCS initiatives, allowing them to pivot from a singular focus on lung cancer detection to comprehensive respiratory disease management in high-risk populations.

Problem

Research questions and friction points this paper is trying to address.

Detect thoracic diseases in LDCT scans efficiently

Address radiologist shortage in large-scale scan interpretation

Provide open-source, low-resource model for diverse disease tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source lightweight model for LDCT analysis

Self-supervised pretraining on diverse datasets

Efficient fine-tuning with minimal GPU resources

🔎 Similar Papers

No similar papers found.