OralGPT-Omni: A Versatile Dental Multimodal Large Language Model

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Dental medicine lacks dedicated research on multimodal large language models (MLLMs), hindered by scarce annotated data, fragmented modality modeling, and insufficient clinical trustworthiness. Method: We introduce TRACE-CoT, a clinical chain-of-thought reasoning dataset, and a four-stage progressive training paradigm to construct MMOral-Uni—the first unified multimodal dental evaluation benchmark—supporting joint modeling of five imaging modalities (e.g., X-ray, CBCT) and clinical tasks (e.g., diagnosis, segmentation). Our approach innovatively integrates chain-of-thought supervision, cross-modal alignment, and expert-annotated data augmentation. Contribution/Results: This framework significantly enhances model interpretability and generalization. On MMOral-Uni and MMOral-OPG benchmarks, our method achieves 51.84 and 45.31 points, respectively—substantially outperforming prior approaches—and advances dental AI toward clinical deployment.

Technology Category

Application Category

📝 Abstract
Multimodal Large Language Models (MLLMs) have exhibited immense potential across numerous medical specialties; yet, dentistry remains underexplored, in part due to limited domain-specific data, scarce dental expert annotations, insufficient modality-specific modeling, and challenges in reliability. In this paper, we present OralGPT-Omni, the first dental-specialized MLLM designed for comprehensive and trustworthy analysis across diverse dental imaging modalities and clinical tasks. To explicitly capture dentists' diagnostic reasoning, we construct TRACE-CoT, a clinically grounded chain-of-thought dataset that mirrors dental radiologists' decision-making processes. This reasoning supervision, combined with our proposed four-stage training paradigm, substantially strengthens the model's capacity for dental image understanding and analysis. In parallel, we introduce MMOral-Uni, the first unified multimodal benchmark for dental image analysis. It comprises 2,809 open-ended question-answer pairs spanning five modalities and five tasks, offering a comprehensive evaluation suite to date for MLLMs in digital dentistry. OralGPT-Omni achieves an overall score of 51.84 on the MMOral-Uni benchmark and 45.31 on the MMOral-OPG benchmark, dramatically outperforming the scores of GPT-5. Our work promotes intelligent dentistry and paves the way for future advances in dental image analysis. All code, benchmark, and models will be made publicly available.
Problem

Research questions and friction points this paper is trying to address.

Dentistry lacks specialized multimodal AI models for comprehensive dental image analysis.
Limited dental data and expert annotations hinder reliable AI applications in dentistry.
No unified benchmark exists to evaluate multimodal AI performance across dental tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed OralGPT-Omni, a multimodal model for dental image analysis
Created TRACE-CoT dataset to mimic dentists' diagnostic reasoning processes
Introduced MMOral-Uni benchmark for evaluating dental multimodal models
🔎 Similar Papers
No similar papers found.
J
Jing Hao
Faculty of Dentistry, The University of Hong Kong
Y
Yuci Liang
College of Computer Science and Software Engineering, Shenzhen University
L
Lizhuo Lin
Faculty of Dentistry, The University of Hong Kong
Yuxuan Fan
Yuxuan Fan
Peking University
Natural Language Processing
W
Wenkai Zhou
Faculty of Dentistry, The University of Hong Kong
K
Kaixin Guo
Faculty of Dentistry, The University of Hong Kong
Zanting Ye
Zanting Ye
Southern Medical University
Deep learningMedical Imgae analysisVLM
Yanpeng Sun
Yanpeng Sun
Nanjing University of Science and Technology
Computer visionDeep LearningMultimedia
X
Xinyu Zhang
University of Auckland
Y
Yanqi Yang
Faculty of Dentistry, The University of Hong Kong
Qiankun Li
Qiankun Li
Research Fellow@NTU, Ph.D.@USTC
MLLMAI4HealthComputer VisionPattern RecognitionTrustworthy AI
H
Hao Tang
School of Computer Science, Peking University
J
James Kit-Hon Tsoi
Faculty of Dentistry, The University of Hong Kong
Linlin Shen
Linlin Shen
Shenzhen University
Deep LearningComputer VisionFacial Analysis/RecognitionMedical Image Analysis
K
Kuo Feng Hung
Faculty of Dentistry, The University of Hong Kong