Development and validation of an AI foundation model for endoscopic diagnosis of esophagogastric junction adenocarcinoma: a cohort and deep learning study

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Early diagnosis of esophagogastric junction adenocarcinoma (EGJA) is highly operator-dependent, resulting in low detection rates and poor inter-observer consistency—necessitating AI-assisted tools. This study introduces, for the first time, a vision foundation model framework for endoscopic staging of EGJA, proposing an end-to-end diagnostic system that synergistically fuses global representations from DINOv2 with local fine-grained features from ResNet50. Evaluated on multi-center test sets, the method achieves 88.95%–92.56% accuracy—significantly outperforming existing AI models and the average performance of gastroenterology endoscopists. It also enhances diagnostic consistency across physician experience levels and improves early lesion detection. Key contributions include: (i) the first foundation-model-based framework tailored specifically for EGJA endoscopic staging; (ii) a novel global-local feature collaboration mechanism; and (iii) a clinically generalizable, high-accuracy integrated solution for both screening and staging.

Technology Category

Application Category

📝 Abstract

The early detection of esophagogastric junction adenocarcinoma (EGJA) is crucial for improving patient prognosis, yet its current diagnosis is highly operator-dependent. This paper aims to make the first attempt to develop an artificial intelligence (AI) foundation model-based method for both screening and staging diagnosis of EGJA using endoscopic images. In this cohort and learning study, we conducted a multicentre study across seven Chinese hospitals between December 28, 2016 and December 30, 2024. It comprises 12,302 images from 1,546 patients; 8,249 of them were employed for model training, while the remaining were divided into the held-out (112 patients, 914 images), external (230 patients, 1,539 images), and prospective (198 patients, 1,600 images) test sets for evaluation. The proposed model employs DINOv2 (a vision foundation model) and ResNet50 (a convolutional neural network) to extract features of global appearance and local details of endoscopic images for EGJA staging diagnosis. Our model demonstrates satisfactory performance for EGJA staging diagnosis across three test sets, achieving an accuracy of 0.9256, 0.8895, and 0.8956, respectively. In contrast, among representative AI models, the best one (ResNet50) achieves an accuracy of 0.9125, 0.8382, and 0.8519 on the three test sets, respectively; the expert endoscopists achieve an accuracy of 0.8147 on the held-out test set. Moreover, with the assistance of our model, the overall accuracy for the trainee, competent, and expert endoscopists improves from 0.7035, 0.7350, and 0.8147 to 0.8497, 0.8521, and 0.8696, respectively. To our knowledge, our model is the first application of foundation models for EGJA staging diagnosis and demonstrates great potential in both diagnostic accuracy and efficiency.

Problem

Research questions and friction points this paper is trying to address.

Developing an AI foundation model for endoscopic EGJA diagnosis

Addressing operator-dependent diagnosis of esophagogastric junction adenocarcinoma

Creating automated screening and staging system using endoscopic images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses DINOv2 foundation model for endoscopic image analysis

Combines global appearance and local detail features

First foundation model application for EGJA staging diagnosis

🔎 Similar Papers

No similar papers found.