DentVLM: A Multimodal Vision-Language Model for Comprehensive Dental Diagnosis and Enhanced Clinical Practice

📅 2025-09-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current AI models struggle to integrate multimodal oral imaging data and support complex clinical dental diagnosis. Method: We introduce the first large-scale bilingual multimodal dental dataset—comprising 110,000 images and 2.46 million vision-language question-answer pairs—covering seven types of 2D imaging modalities and 36 diagnostic tasks. We further propose a novel vision-language model architecture specifically designed for comprehensive dental diagnosis, featuring deep cross-modal understanding and clinical knowledge reasoning. Contribution/Results: In clinical validation involving 25 dentists, our model outperformed junior clinicians on 21 tasks and senior clinicians on 12 tasks, while reducing average diagnostic time by 15–22%. The framework effectively bridges the expertise gap in dental care, enabling multi-institutional collaboration, remote home-based health monitoring, and intelligent hospital-assisted diagnosis.

Technology Category

Application Category

📝 Abstract
Diagnosing and managing oral diseases necessitate advanced visual interpretation across diverse imaging modalities and integrated information synthesis. While current AI models excel at isolated tasks, they often fall short in addressing the complex, multimodal requirements of comprehensive clinical dental practice. Here we introduce DentVLM, a multimodal vision-language model engineered for expert-level oral disease diagnosis. DentVLM was developed using a comprehensive, large-scale, bilingual dataset of 110,447 images and 2.46 million visual question-answering (VQA) pairs. The model is capable of interpreting seven 2D oral imaging modalities across 36 diagnostic tasks, significantly outperforming leading proprietary and open-source models by 19.6% higher accuracy for oral diseases and 27.9% for malocclusions. In a clinical study involving 25 dentists, evaluating 1,946 patients and encompassing 3,105 QA pairs, DentVLM surpassed the diagnostic performance of 13 junior dentists on 21 of 36 tasks and exceeded that of 12 senior dentists on 12 of 36 tasks. When integrated into a collaborative workflow, DentVLM elevated junior dentists' performance to senior levels and reduced diagnostic time for all practitioners by 15-22%. Furthermore, DentVLM exhibited promising performance across three practical utility scenarios, including home-based dental health management, hospital-based intelligent diagnosis and multi-agent collaborative interaction. These findings establish DentVLM as a robust clinical decision support tool, poised to enhance primary dental care, mitigate provider-patient imbalances, and democratize access to specialized medical expertise within the field of dentistry.
Problem

Research questions and friction points this paper is trying to address.

Developing multimodal AI for comprehensive dental diagnosis
Addressing limitations of current AI in clinical dental practice
Enhancing diagnostic accuracy and efficiency across oral diseases
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal vision-language model for expert dental diagnosis
Uses large bilingual dataset with visual question-answering pairs
Interprets seven oral imaging modalities across diagnostic tasks
🔎 Similar Papers
No similar papers found.
Z
Zijie Meng
Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310016, Zhejiang, China.
Jin Hao
Jin Hao
Assistant Professor, Shanghai Jiao Tong University
Stem Cell BiologyNeuroscienceBrain organoids
X
Xiwei Dai
Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310016, Zhejiang, China.
Y
Yang Feng
Angelalign Technology Inc., Shanghai 200082, China
Jiaxiang Liu
Jiaxiang Liu
Zhejiang University
Multimodal FusionMedical Image Analysis
B
Bin Feng
Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310016, Zhejiang, China.
Huikai Wu
Huikai Wu
Angelalign Technology Inc., Shanghai 200082, China
Xiaotang Gai
Xiaotang Gai
Zhejiang University
H
Hengchuan Zhu
College of Computer Science and Technology, Zhejiang University-University of Illinois Urbana-Champaign Institute, Zhejiang University, Hangzhou 310027, Zhejiang, China.
T
Tianxiang Hu
College of Computer Science and Technology, Zhejiang University-University of Illinois Urbana-Champaign Institute, Zhejiang University, Hangzhou 310027, Zhejiang, China.
Yangyang Wu
Yangyang Wu
Zhejiang University
Large Language ModelData CleaningMulti-modal Analysis
Hongxia Xu
Hongxia Xu
Zhejiang University
AI4ScienceNanomedicineMedical imaging
J
Jin Li
Department of Stomatology, The First Affiliated Hospital of Shenzhen University, Shenzhen Second People’s Hospital, Shenzhen 518035, China
J
Jun Xiao
College of Computer Science and Technology, Zhejiang University-University of Illinois Urbana-Champaign Institute, Zhejiang University, Hangzhou 310027, Zhejiang, China.
X
Xiaoqiang Liu
Department of Prosthodontics, Peking University School and Hospital of Stomatology, Beijing 100081, China
Joey Tianyi Zhou
Joey Tianyi Zhou
A*STAR and NUS
Efficient AIRobust & Safe AI
F
Fudong Zhu
Stomatology Hospital, School of Stomatology, Zhejiang University School of Medicine, Zhejiang University, Hangzhou 310016, Zhejiang, China.
Zhihe Zhao
Zhihe Zhao
State Key Laboratory of Oral Diseases, West China School of Stomatology, Sichuan University
BiomechanicsOrthodonticsStem Cell Mechanobiology
L
Lunguo Xia
Department of Orthodontics, Shanghai Ninth People’s Hospital, College of Stomatology, Shanghai Jiao Tong University School of Medicine, Shanghai 200011, China.
B
Bing Fang
Department of Orthodontics, Shanghai Ninth People’s Hospital, College of Stomatology, Shanghai Jiao Tong University School of Medicine, Shanghai 200011, China.
Jimeng Sun
Jimeng Sun
Professor at University of Illinois Urbana-Champaign
AI for healthcareMachine learning for healthcaredeep learning for healthcare
J
Jian Wu
College of Computer Science and Technology, Zhejiang University-University of Illinois Urbana-Champaign Institute, Zhejiang University, Hangzhou 310027, Zhejiang, China.
Zuozhu Liu
Zuozhu Liu
Assistant Professor, Zhejiang University/University of Illinois Urbana-Champaign
deep learningvision-language modelsmedical AI