🤖 AI Summary
To address the clinical bottleneck in detecting actionable driver mutations in non-small-cell lung cancer (NSCLC)—namely, limited accessibility and turnaround time of genomic testing—this study proposes a tissue-type-aware asymmetric Transformer-based multiple instance learning (MIL) model. Methodologically, we design a low-complexity asymmetric decoder with heterogeneous query/key/value dimensions and integrate histological type priors to enhance biological interpretability, enabling end-to-end driver mutation prediction directly from whole-slide images within an MIL framework. Evaluated on six key driver mutations—including rare variants ERBB2 and BRAF—the model achieves an average accuracy 3% higher than current state-of-the-art MIL methods, with over 4% improvement specifically for rare mutations, approaching the performance of gold-standard genomic assays. This work establishes a novel paradigm for non-invasive, rapid, and scalable molecular subtyping in NSCLC.
📝 Abstract
Identifying actionable driver mutations in non-small cell lung cancer (NSCLC) can impact treatment decisions and significantly improve patient outcomes. Despite guideline recommendations, broader adoption of genetic testing remains challenging due to limited availability and lengthy turnaround times. Machine Learning (ML) methods for Computational Pathology (CPath) offer a potential solution; however, research often focuses on only one or two common mutations, limiting the clinical value of these tools and the pool of patients who can benefit from them. This study evaluates various Multiple Instance Learning (MIL) techniques to detect six key actionable NSCLC driver mutations: ALK, BRAF, EGFR, ERBB2, KRAS, and MET ex14. Additionally, we introduce an Asymmetric Transformer Decoder model that employs queries and key-values of varying dimensions to maintain a low query dimensionality. This approach efficiently extracts information from patch embeddings and minimizes overfitting risks, proving highly adaptable to the MIL setting. Moreover, we present a method to directly utilize tissue type in the model, addressing a typical MIL limitation where either all regions or only some specific regions are analyzed, neglecting biological relevance. Our method outperforms top MIL models by an average of 3%, and over 4% when predicting rare mutations such as ERBB2 and BRAF, moving ML-based tests closer to being practical alternatives to standard genetic testing.