Vision Language Models versus Machine Learning Models Performance on Polyp Detection and Classification in Colonoscopy Images

📅 2025-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Evaluating the diagnostic efficacy of vision-language models (VLMs) for polyp detection (CADe) and classification (CADx) in colonoscopy images remains challenging due to the lack of standardized, pathology-annotated benchmarks and systematic comparisons against conventional models. Method: We establish the first unified medical imaging evaluation framework, benchmarking state-of-the-art VLMs—including GPT-4, Gemini-1.5-Pro, Claude-3-Opus, BiomedCLIP, and CLIP—against CNNs (e.g., ResNet50) and traditional machine learning models (e.g., SVM, Random Forest) on a clinical dataset of 2,258 pathology-confirmed colonoscopy images. Contribution/Results: ResNet50 achieves the highest CADe performance (F1 = 91.35%, AUROC = 0.98). Among VLMs, GPT-4 significantly outperforms other general-purpose models in both CADe (F1 = 81.02%) and CADx (weighted F1 = 41.18%), while BiomedCLIP demonstrates promising few-shot capability. Our findings validate that domain-adapted VLMs hold practical potential for clinical decision support under data-constrained settings.

Technology Category

Application Category

📝 Abstract
Introduction: This study provides a comprehensive performance assessment of vision-language models (VLMs) against established convolutional neural networks (CNNs) and classic machine learning models (CMLs) for computer-aided detection (CADe) and computer-aided diagnosis (CADx) of colonoscopy polyp images. Method: We analyzed 2,258 colonoscopy images with corresponding pathology reports from 428 patients. We preprocessed all images using standardized techniques (resizing, normalization, and augmentation) and implemented a rigorous comparative framework evaluating 11 distinct models: ResNet50, 4 CMLs (random forest, support vector machine, logistic regression, decision tree), two specialized contrastive vision language encoders (CLIP, BiomedCLIP), and three general-purpose VLMs ( GPT-4 Gemini-1.5-Pro, Claude-3-Opus). Our performance assessment focused on two clinical tasks: polyp detection (CADe) and classification (CADx). Result: In polyp detection, ResNet50 achieved the best performance (F1: 91.35%, AUROC: 0.98), followed by BiomedCLIP (F1: 88.68%, AUROC: [AS1] ). GPT-4 demonstrated comparable effectiveness to traditional machine learning approaches (F1: 81.02%, AUROC: [AS2] ), outperforming other general-purpose VLMs. For polyp classification, performance rankings remained consistent but with lower overall metrics. ResNet50 maintained the highest efficacy (weighted F1: 74.94%), while GPT-4 demonstrated moderate capability (weighted F1: 41.18%), significantly exceeding other VLMs (Claude-3-Opus weighted F1: 25.54%, Gemini 1.5 Pro weighted F1: 6.17%). Conclusion: CNNs remain superior for both CADx and CADe tasks. However, VLMs like BioMedCLIP and GPT-4 may be useful for polyp detection tasks where training CNNs is not feasible.
Problem

Research questions and friction points this paper is trying to address.

Compare VLMs and ML models for polyp detection in colonoscopy images
Evaluate performance of 11 models on CADe and CADx tasks
Assess ResNet50, BiomedCLIP, and GPT-4 for polyp classification accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Used vision-language models for polyp detection
Compared VLMs with CNNs and classic ML
Standardized image preprocessing and augmentation techniques
🔎 Similar Papers
No similar papers found.
Mohammad Amin Khalafi
Mohammad Amin Khalafi
Research Fellow at Research Institute for Gastroenterology and Liver Diseases
AILLMInternal Medicine
Seyed Amir Ahmad Safavi-Naini
Seyed Amir Ahmad Safavi-Naini
Research Fellow at Research Institute for Gastroenterology and Liver Diseases
Gastrointestinal CancerPancreatic CancerCancer PreventionPrecision Medicine
A
Ameneh Salehi
Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
Nariman Naderi
Nariman Naderi
MD. shahid beheshti university of medical science
AIcomputer visionLLMmedicine
D
Dorsa Alijanzadeh
Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
P
P. K. Moghadam
Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
K
Kaveh Kavosi
Laboratory of Complex Biological Systems and Bioinformatics (CBB), Department of Bioinformatics, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran
N
Negar Golestani
Division of Data-Driven and Digital Health (D3M), The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, NY, USA
S
S. Shahrokh
Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
S
Soltanali Fallah
Department of GI Diseases, Tehran Milad Hospital, Tehran, Iran
J
Jamil S Samaan
Karsh Division of Gastroenterology and Hepatology, Cedars-Sinai Medical Center, Los Angeles, CA
N
Nicholas P. Tatonetti
Department of Computational Biomedicine, Cedars-Sinai Medical Center, West Hollywood, California, USA; Cedars-Sinai Cancer, Cedars-Sinai Medical Center, 8700 Beverly Blvd. Los Angeles, CA, USA; Department of Biomedical Informatics, Columbia University, New York, New York, USA
N
Nicholas A. Hoerter
Henry D. Janowitz Division of Gastroenterology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
Girish Nadkarni
Girish Nadkarni
Icahn School of Medicine at Mount Sinai
HypertensionGeneticsKidney DiseaseAIMachine Learning
H
H. A. Aghdaei
Research Institute for Gastroenterology and Liver Diseases, Shahid Beheshti University of Medical Sciences, Tehran, Iran
Ali Soroush
Ali Soroush
Icahn School of Medicine at Mount Sinai
Gastroenterology Artificial Intelligence Machine Learning