π€ AI Summary
This work addresses the limitations of existing computational pathology approaches, which often rely on task-specific models and lack deep subspecialty modeling and prospective validation within real-world clinical workflows. The authors propose PulmoFoundation, the first foundation model designed for end-to-end pulmonary pathology assessment. Pretrained on approximately 40,000 H&E-stained whole-slide images using the Virchow2 architecture, it supports 32 preoperative, intraoperative, and postoperative clinical tasks. In a multicenter prospective randomized controlled trial involving 1,357 patients, PulmoFoundation achieved an average AUC of 92.3%, significantly improving diagnostic accuracy (91.7%) and inter-rater consistency while reducing assessment time by 19.6%. It also decreased biopsy review burden by 68.8%, frozen section reinterpretation by 83.0%, and immunohistochemistry orders by 44.5%, thereby optimizing clinical workflow efficiency.
π Abstract
Pathological assessment guides lung cancer diagnosis, treatment selection, and prognostic evaluation, yet current CPath approaches rely on task-specific models for isolated objectives. Although pan-cancer foundation models offer versatility, they lack subspecialty-level depth and have not been evaluated across clinical workflows or prospectively validated in real-world settings. We introduce PulmoFoundation, a multi-center, prospectively validated, randomized controlled trial (RCT)-evaluated foundation model for comprehensive lung pathology assessment across pre-operative, intra-operative, and post-operative care. Built upon Virchow2 via subspecialty-specific pretraining using ~40,000 diagnostic H&E-stained whole-slide images (WSIs), PulmoFoundation was systematically evaluated on ~26,000 WSIs across 32 clinically relevant tasks. In addition to accurately predicting molecular markers and patient survival, our model achieves clinical-grade performance in core diagnostic tasks across biopsy, frozen section, and surgical resection slides. In a registered prospective study of 1,357 patients across 11 diagnostic tasks, our model achieved an average AUC of 92.3%. Using pre-specified triage thresholds, PulmoFoundation could reduce additional second-review burden for 68.8% of biopsies and 83.0% of frozen sections, and defer 44.5% of IHC stain orders, with PPVs of 1.0, 0.991, and 0.966. Beyond prospective validation, we conducted a crossover RCT with eight pathologists, in which AI assistance improved diagnostic accuracy across 4,928 case-reader pairs (91.7% w/ AI vs. 83.8% w/o AI). AI assistance also reduced median diagnostic time by 19.6%, increased diagnostic confidence by 8.7%, and improved inter-rater agreement from moderate (kappa = 0.56) to substantial (kappa = 0.76). Together, these evaluations support PulmoFoundation as a clinically validated decision-support system for lung pathology.