🤖 AI Summary
Early diagnosis of esophagogastric junction adenocarcinoma (EGJA) is highly operator-dependent, resulting in low detection rates and poor inter-observer consistency—necessitating AI-assisted tools. This study introduces, for the first time, a vision foundation model framework for endoscopic staging of EGJA, proposing an end-to-end diagnostic system that synergistically fuses global representations from DINOv2 with local fine-grained features from ResNet50. Evaluated on multi-center test sets, the method achieves 88.95%–92.56% accuracy—significantly outperforming existing AI models and the average performance of gastroenterology endoscopists. It also enhances diagnostic consistency across physician experience levels and improves early lesion detection. Key contributions include: (i) the first foundation-model-based framework tailored specifically for EGJA endoscopic staging; (ii) a novel global-local feature collaboration mechanism; and (iii) a clinically generalizable, high-accuracy integrated solution for both screening and staging.
📝 Abstract
The early detection of esophagogastric junction adenocarcinoma (EGJA) is crucial for improving patient prognosis, yet its current diagnosis is highly operator-dependent. This paper aims to make the first attempt to develop an artificial intelligence (AI) foundation model-based method for both screening and staging diagnosis of EGJA using endoscopic images. In this cohort and learning study, we conducted a multicentre study across seven Chinese hospitals between December 28, 2016 and December 30, 2024. It comprises 12,302 images from 1,546 patients; 8,249 of them were employed for model training, while the remaining were divided into the held-out (112 patients, 914 images), external (230 patients, 1,539 images), and prospective (198 patients, 1,600 images) test sets for evaluation. The proposed model employs DINOv2 (a vision foundation model) and ResNet50 (a convolutional neural network) to extract features of global appearance and local details of endoscopic images for EGJA staging diagnosis. Our model demonstrates satisfactory performance for EGJA staging diagnosis across three test sets, achieving an accuracy of 0.9256, 0.8895, and 0.8956, respectively. In contrast, among representative AI models, the best one (ResNet50) achieves an accuracy of 0.9125, 0.8382, and 0.8519 on the three test sets, respectively; the expert endoscopists achieve an accuracy of 0.8147 on the held-out test set. Moreover, with the assistance of our model, the overall accuracy for the trainee, competent, and expert endoscopists improves from 0.7035, 0.7350, and 0.8147 to 0.8497, 0.8521, and 0.8696, respectively. To our knowledge, our model is the first application of foundation models for EGJA staging diagnosis and demonstrates great potential in both diagnostic accuracy and efficiency.