OralGPT-Omni: A Versatile Dental Multimodal Large Language Model

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Dental medicine lacks dedicated research on multimodal large language models (MLLMs), hindered by scarce annotated data, fragmented modality modeling, and insufficient clinical trustworthiness. Method: We introduce TRACE-CoT, a clinical chain-of-thought reasoning dataset, and a four-stage progressive training paradigm to construct MMOral-Uni—the first unified multimodal dental evaluation benchmark—supporting joint modeling of five imaging modalities (e.g., X-ray, CBCT) and clinical tasks (e.g., diagnosis, segmentation). Our approach innovatively integrates chain-of-thought supervision, cross-modal alignment, and expert-annotated data augmentation. Contribution/Results: This framework significantly enhances model interpretability and generalization. On MMOral-Uni and MMOral-OPG benchmarks, our method achieves 51.84 and 45.31 points, respectively—substantially outperforming prior approaches—and advances dental AI toward clinical deployment.

Technology Category

Application Category

📝 Abstract

Multimodal Large Language Models (MLLMs) have exhibited immense potential across numerous medical specialties; yet, dentistry remains underexplored, in part due to limited domain-specific data, scarce dental expert annotations, insufficient modality-specific modeling, and challenges in reliability. In this paper, we present OralGPT-Omni, the first dental-specialized MLLM designed for comprehensive and trustworthy analysis across diverse dental imaging modalities and clinical tasks. To explicitly capture dentists' diagnostic reasoning, we construct TRACE-CoT, a clinically grounded chain-of-thought dataset that mirrors dental radiologists' decision-making processes. This reasoning supervision, combined with our proposed four-stage training paradigm, substantially strengthens the model's capacity for dental image understanding and analysis. In parallel, we introduce MMOral-Uni, the first unified multimodal benchmark for dental image analysis. It comprises 2,809 open-ended question-answer pairs spanning five modalities and five tasks, offering a comprehensive evaluation suite to date for MLLMs in digital dentistry. OralGPT-Omni achieves an overall score of 51.84 on the MMOral-Uni benchmark and 45.31 on the MMOral-OPG benchmark, dramatically outperforming the scores of GPT-5. Our work promotes intelligent dentistry and paves the way for future advances in dental image analysis. All code, benchmark, and models will be made publicly available.

Problem

Research questions and friction points this paper is trying to address.

Dentistry lacks specialized multimodal AI models for comprehensive dental image analysis.

Limited dental data and expert annotations hinder reliable AI applications in dentistry.

No unified benchmark exists to evaluate multimodal AI performance across dental tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed OralGPT-Omni, a multimodal model for dental image analysis

Created TRACE-CoT dataset to mimic dentists' diagnostic reasoning processes

Introduced MMOral-Uni benchmark for evaluating dental multimodal models

🔎 Similar Papers

A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine