🤖 AI Summary
This study addresses the lack of accessible molecular testing for EGFR mutations in lung adenocarcinoma (LUAD) patients from Southeast Asian populations—such as India—with high EGFR mutation prevalence. We propose a weakly supervised deep learning framework leveraging a pretrained histopathology foundation model. Methodologically, it integrates a vision transformer (ViT) with attention-based multiple instance learning (ABMIL) and transfer learning to predict EGFR mutation status directly from H&E-stained whole-slide images. Key contributions include: (i) the first adaptation of a pathology foundation model to LUAD mutation prediction; and (ii) a lightweight, few-shot–friendly architecture enabling robust cross-institutional generalization. The model achieves AUCs of 0.933 on an internal cohort and 0.965 on the external TCGA validation set—significantly outperforming existing approaches—and offers a deployable AI-assisted diagnostic tool for resource-limited settings.
📝 Abstract
Lung adenocarcinoma (LUAD) is a subtype of non-small cell lung cancer (NSCLC). LUAD with mutation in the EGFR gene accounts for approximately 46% of LUAD cases. Patients carrying EGFR mutations can be treated with specific tyrosine kinase inhibitors (TKIs). Hence, predicting EGFR mutation status can help in clinical decision making. H&E-stained whole slide imaging (WSI) is a routinely performed screening procedure for cancer staging and subtyping, especially affecting the Southeast Asian populations with significantly higher incidence of the mutation when compared to Caucasians (39-64% vs 7-22%). Recent progress in AI models has shown promising results in cancer detection and classification. In this study, we propose a deep learning (DL) framework built on vision transformers (ViT) based pathology foundation model and attention-based multiple instance learning (ABMIL) architecture to predict EGFR mutation status from H&E WSI. The developed pipeline was trained using data from an Indian cohort (170 WSI) and evaluated across two independent datasets: Internal test (30 WSI from Indian cohort) set, and an external test set from TCGA (86 WSI). The model shows consistent performance across both datasets, with AUCs of 0.933 (+/-0.010), and 0.965 (+/-0.015) for the internal and external test sets respectively. This proposed framework can be efficiently trained on small datasets, achieving superior performance as compared to several prior studies irrespective of training domain. The current study demonstrates the feasibility of accurately predicting EGFR mutation status using routine pathology slides, particularly in resource-limited settings using foundation models and attention-based multiple instance learning.